[MOBY-dev] sequence datatypes

Oswaldo Trelles ots at ac.uma.es
Thu Dec 21 07:52:28 UTC 2006


Hi,

We are aware about the problem of network overload produced by large 
data transmission, even when services are located in local networks or 
even in the same server (i.e. most of our services are located in two 
servers, so in general, partial data transmission is unnecessary).

In this line we are analysing two alternatives:
a) sub-workflows (or jobs) to submit a set of tasks to be executed in 
the same server (a scheduler module, as is the case of MOWServ could 
identify the set of related tasks). We need to review the documentation 
to ensure that only the last service returns the output (intermediate 
results could be requested using the asynch protocol).
b) a more general (ergo, more interesting) alternative goes in the way 
to define a pointer/reference to a local object (an object currently 
stored in the local file system). Of course we should avoid defining a 
new object (the corresponding couple-object) for each object (sequences, 
gene-expression data, structures, etc etc and subtypes). So we could 
agree in a label/tag or whatever other alternative in the appropriated 
position of the xml to recognise this type of specification.

of course these are only initial ideas regarding a real problem that 
should be discussed (and solved)

best regards, O.

GNV5-INB




Nassib Nassar escribió:
> Hi Paul,
>
> Sorry for the slow reply....  What you suggest was our original
> intention, but we found it too complicated to explain the difference
> at the taverna level between passing data into the namespace/id
> vs. value fields.  More importantly, I think, it's convenient for the
> workflow developers to be able to pass sequences either by reference
> or by value along a single pathway, anywhere in a workflow where
> sequences are being processed.  The register and lookup services are
> used like filters to abbreviate and expand sequences, but all of our
> services will accept either the standard or abbreviated forms.  This
> is rather experimental, but so far it seems to be working very well.
>
> Nassib
>
>
> On Tue, Dec 12, 2006 at 02:40:45PM -0700, Paul Gordon wrote:
>   
>> Hi Nassib,
>>
>> I looked at the presentation, and I'm not sure why you can't just use a 
>> VirtualSequence instead.  You can then have all of the combinations you 
>> want, as long as you register the namespaces:
>>
>> <VirtualSequence articleName="foo" namespace="renci_global" id="bar">
>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>> </VirtualSequence>
>>
>> <VirtualSequence articleName="foo" namespace="renci_user" id="baz">
>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>> </VirtualSequence>
>>
>> <VirtualSequence articleName="foo" namespace="NCBI_gi" id="123456">
>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>> </VirtualSequence>
>>
>> <DNASequence articleName="foo" namespace="any" id="qux">
>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>> <String articleName="SequenceString" namespace="" id="">ATG...</String>
>> </DNASequence>
>>
>> etc., etc.
>>     
>>> Hi,
>>>
>>> I'd like to start explaining a little bit about our use of biomoby and
>>> also request feedback...
>>>
>>> We're using biomoby mainly with taverna workflows, and gradually
>>> migrating current web services over to become biomoby services (under
>>> biomoby.renci.org).  The workflows we develop are talking to services
>>> that for the most part are based here within our servers.  As a result
>>> we end up passing a very large amount of duplicated sequence data over
>>> the network between taverna and services, often more data than taverna
>>> is happy about.  To get around this we have started passing sequences
>>> by reference using a FASTA-like format that is non-standard but fits
>>> well into our system and the taverna UI.  I'm calling this the "RENCI
>>> sequence" format, and it's basically similar to GenBank, while
>>> allowing an "abbreviated" (truncated) form that consists of only a
>>> partial header line with at least one namespace/id.  (The architecture
>>> is described in http://www.renci.org/~nassar/sequence_registry.ppt )
>>>
>>> We've added some new datatypes under "RenciSequence" for this purpose,
>>> analogous to the existing "GenericSequence".  In general we are using
>>> the existing biomoby datatypes, but for sequences our format seems
>>> unusual enough that we thought it needed its own datatype to avoid
>>> confusion.
>>>
>>> Nassib
>>> _______________________________________________
>>> MOBY-dev mailing list
>>> MOBY-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>>
>>>
>>>   
>>>       
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>     
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev
>
>   



More information about the MOBY-dev mailing list