[MOBY-dev] sequence datatypes

Mark Wilkinson markw at illuminae.com
Fri Dec 29 15:33:13 UTC 2006


Hi all,

just a quick question v.v. this thread:  The work that Heiko and I did  
together a couple of weeks ago while I was in Germany allows the provision  
of services using HTTP POST.  The idea being that you pass a <MOBY.../>  
(rather than <SOAP.../>) message into a service by POST and it returns a  
<MOBY.../> message in response.  It seems to me (though we haven't tried  
it, nor written any code for it yet) that it would be possible to set up:

a)  a service that "streams" its output continuously, rather than waiting  
for the service to finish entirely, and

b)  A client-side SAX parser to deal with the "streaming" output.

The SAX parser will be quite tricky to build given that a MOBY service is  
allowed to output any object that is a child of whatever object it  
registered (so the trigger would have to respond to every case), but  
still... it seems that this architecture would solve the problem we are  
discussing here in a quite simplistic way... it might also overcome *some*  
of the timeout issues, since the service could output the header  
information in order to keep the connection alive (and perhaps even output  
some header "commentary" on a regular basis to keep the connection alive).

v.v. legacy: existing clients would not discover these services, because  
they are registered as Category="moby-post", so there's no problem with  
breaking anyone.

Does this help at all??

M





On Wed, 20 Dec 2006 23:52:28 -0800, Oswaldo Trelles <ots at ac.uma.es> wrote:

> Hi,
>
> We are aware about the problem of network overload produced by large
> data transmission, even when services are located in local networks or
> even in the same server (i.e. most of our services are located in two
> servers, so in general, partial data transmission is unnecessary).
>
> In this line we are analysing two alternatives:
> a) sub-workflows (or jobs) to submit a set of tasks to be executed in
> the same server (a scheduler module, as is the case of MOWServ could
> identify the set of related tasks). We need to review the documentation
> to ensure that only the last service returns the output (intermediate
> results could be requested using the asynch protocol).
> b) a more general (ergo, more interesting) alternative goes in the way
> to define a pointer/reference to a local object (an object currently
> stored in the local file system). Of course we should avoid defining a
> new object (the corresponding couple-object) for each object (sequences,
> gene-expression data, structures, etc etc and subtypes). So we could
> agree in a label/tag or whatever other alternative in the appropriated
> position of the xml to recognise this type of specification.
>
> of course these are only initial ideas regarding a real problem that
> should be discussed (and solved)
>
> best regards, O.
>
> GNV5-INB
>
>
>
>
> Nassib Nassar escribió:
>> Hi Paul,
>>
>> Sorry for the slow reply....  What you suggest was our original
>> intention, but we found it too complicated to explain the difference
>> at the taverna level between passing data into the namespace/id
>> vs. value fields.  More importantly, I think, it's convenient for the
>> workflow developers to be able to pass sequences either by reference
>> or by value along a single pathway, anywhere in a workflow where
>> sequences are being processed.  The register and lookup services are
>> used like filters to abbreviate and expand sequences, but all of our
>> services will accept either the standard or abbreviated forms.  This
>> is rather experimental, but so far it seems to be working very well.
>>
>> Nassib
>>
>>
>> On Tue, Dec 12, 2006 at 02:40:45PM -0700, Paul Gordon wrote:
>>
>>> Hi Nassib,
>>>
>>> I looked at the presentation, and I'm not sure why you can't just use a
>>> VirtualSequence instead.  You can then have all of the combinations you
>>> want, as long as you register the namespaces:
>>>
>>> <VirtualSequence articleName="foo" namespace="renci_global" id="bar">
>>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>>> </VirtualSequence>
>>>
>>> <VirtualSequence articleName="foo" namespace="renci_user" id="baz">
>>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>>> </VirtualSequence>
>>>
>>> <VirtualSequence articleName="foo" namespace="NCBI_gi" id="123456">
>>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>>> </VirtualSequence>
>>>
>>> <DNASequence articleName="foo" namespace="any" id="qux">
>>> <Integer articleName="Length" namespace="" id="">1500</Integer>
>>> <String articleName="SequenceString" namespace="" id="">ATG...</String>
>>> </DNASequence>
>>>
>>> etc., etc.
>>>
>>>> Hi,
>>>>
>>>> I'd like to start explaining a little bit about our use of biomoby and
>>>> also request feedback...
>>>>
>>>> We're using biomoby mainly with taverna workflows, and gradually
>>>> migrating current web services over to become biomoby services (under
>>>> biomoby.renci.org).  The workflows we develop are talking to services
>>>> that for the most part are based here within our servers.  As a result
>>>> we end up passing a very large amount of duplicated sequence data over
>>>> the network between taverna and services, often more data than taverna
>>>> is happy about.  To get around this we have started passing sequences
>>>> by reference using a FASTA-like format that is non-standard but fits
>>>> well into our system and the taverna UI.  I'm calling this the "RENCI
>>>> sequence" format, and it's basically similar to GenBank, while
>>>> allowing an "abbreviated" (truncated) form that consists of only a
>>>> partial header line with at least one namespace/id.  (The architecture
>>>> is described in http://www.renci.org/~nassar/sequence_registry.ppt )
>>>>
>>>> We've added some new datatypes under "RenciSequence" for this purpose,
>>>> analogous to the existing "GenericSequence".  In general we are using
>>>> the existing biomoby datatypes, but for sequences our format seems
>>>> unusual enough that we thought it needed its own datatype to avoid
>>>> confusion.
>>>>
>>>> Nassib
>>>> _______________________________________________
>>>> MOBY-dev mailing list
>>>> MOBY-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>>>
>>>>
>>>>
>>>>
>>> _______________________________________________
>>> MOBY-dev mailing list
>>> MOBY-dev at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>>
>> _______________________________________________
>> MOBY-dev mailing list
>> MOBY-dev at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/moby-dev
>>
>>
> _______________________________________________
> MOBY-dev mailing list
> MOBY-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/moby-dev



-- 
--
Mark Wilkinson
Assistant Professor, Dept. Medical Genetics
University of British Columbia
PI Bioinformatics
iCAPTURE Centre, St. Paul's Hospital

***CONFIDENTIALITY NOTICE***
This electronic message is intended only for the use of the addressee and  
may contain information that is privileged and confidential.  Any  
dissemination, distribution or copying of this communication by  
unauthorized individuals is strictly prohibited. If you have received this  
communication in error, please notify the sender immediately by reply  
e-mail and delete the original and all copies from your system.
 



More information about the MOBY-dev mailing list