[MOBY-dev] sequence datatypes

Tue Dec 12 20:36:11 UTC 2006

Hi,

I'd like to start explaining a little bit about our use of biomoby and
also request feedback...

We're using biomoby mainly with taverna workflows, and gradually
migrating current web services over to become biomoby services (under
biomoby.renci.org).  The workflows we develop are talking to services
that for the most part are based here within our servers.  As a result
we end up passing a very large amount of duplicated sequence data over
the network between taverna and services, often more data than taverna
is happy about.  To get around this we have started passing sequences
by reference using a FASTA-like format that is non-standard but fits
well into our system and the taverna UI.  I'm calling this the "RENCI
sequence" format, and it's basically similar to GenBank, while
allowing an "abbreviated" (truncated) form that consists of only a
partial header line with at least one namespace/id.  (The architecture
is described in http://www.renci.org/~nassar/sequence_registry.ppt )

We've added some new datatypes under "RenciSequence" for this purpose,
analogous to the existing "GenericSequence".  In general we are using
the existing biomoby datatypes, but for sequences our format seems
unusual enough that we thought it needed its own datatype to avoid
confusion.

Nassib