[MOBY-dev] sequence datatypes

Wed Dec 20 21:43:05 UTC 2006

Hi Paul,

Sorry for the slow reply....  What you suggest was our original
intention, but we found it too complicated to explain the difference
at the taverna level between passing data into the namespace/id
vs. value fields.  More importantly, I think, it's convenient for the
workflow developers to be able to pass sequences either by reference
or by value along a single pathway, anywhere in a workflow where
sequences are being processed.  The register and lookup services are
used like filters to abbreviate and expand sequences, but all of our
services will accept either the standard or abbreviated forms.  This
is rather experimental, but so far it seems to be working very well.

Nassib

On Tue, Dec 12, 2006 at 02:40:45PM -0700, Paul Gordon wrote:
> Hi Nassib,
> 
> I looked at the presentation, and I'm not sure why you can't just use a 
> VirtualSequence instead.  You can then have all of the combinations you 
> want, as long as you register the namespaces:
> 
> <VirtualSequence articleName="foo" namespace="renci_global" id="bar">
> <Integer articleName="Length" namespace="" id="">1500</Integer>
> </VirtualSequence>
> 
> <VirtualSequence articleName="foo" namespace="renci_user" id="baz">
> <Integer articleName="Length" namespace="" id="">1500</Integer>
> </VirtualSequence>
> 
> <VirtualSequence articleName="foo" namespace="NCBI_gi" id="123456">
> <Integer articleName="Length" namespace="" id="">1500</Integer>
> </VirtualSequence>
> 
> <DNASequence articleName="foo" namespace="any" id="qux">
> <Integer articleName="Length" namespace="" id="">1500</Integer>
> <String articleName="SequenceString" namespace="" id="">ATG...</String>
> </DNASequence>
> 
> etc., etc.
> > Hi,
> >
> > I'd like to start explaining a little bit about our use of biomoby and
> > also request feedback...
> >
> > We're using biomoby mainly with taverna workflows, and gradually
> > migrating current web services over to become biomoby services (under
> > biomoby.renci.org).  The workflows we develop are talking to services
> > that for the most part are based here within our servers.  As a result
> > we end up passing a very large amount of duplicated sequence data over
> > the network between taverna and services, often more data than taverna
> > is happy about.  To get around this we have started passing sequences
> > by reference using a FASTA-like format that is non-standard but fits
> > well into our system and the taverna UI.  I'm calling this the "RENCI
> > sequence" format, and it's basically similar to GenBank, while
> > allowing an "abbreviated" (truncated) form that consists of only a
> > partial header line with at least one namespace/id.  (The architecture
> > is described in http://www.renci.org/~nassar/sequence_registry.ppt )
> >
> > We've added some new datatypes under "RenciSequence" for this purpose,
> > analogous to the existing "GenericSequence".  In general we are using
> > the existing biomoby datatypes, but for sequences our format seems
> > unusual enough that we thought it needed its own datatype to avoid
> > confusion.
> >
> > Nassib
> > _______________________________________________
> > MOBY-dev mailing list
> > MOBY-dev at lists.open-bio.org
> > http://lists.open-bio.org/mailman/listinfo/moby-dev
> >
> >
> >   
> 
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev