[MOBY-dev] Re: Problems with Biomoby services in Taverna 1.2

Heiko Schoof schoof at mpiz-koeln.mpg.de
Thu Jul 7 15:34:39 UTC 2005


Please let's take a minute and review the use of collections in 
BioMoby, before we go off to create "workarounds"...
I myself am confused about the use of collections. Originally I had in 
my mind that Collections were a construct to allow objects that 
inherently belong together to be "bagged". Example: A multiple 
alignment program that takes any number of sequences as input. 
Example2: A keyword search that takes any number of keywords and then 
does a query combining all of them. As opposed to: Inputing a list of 
keywords and executing the keyword search for each of them, which would 
require a separate moby:mobyData and queryID for each of the keywords.

The confusion starts with the output of services. My understanding was 
that ONLY a service that is guaranteed to output exactly one object for 
each query (e.g. an averaging service that outputs the average of a 
list of inputs) is registered as outputting a Simple, all others have 
to output collections (as there must be exactly one mobyData matching 
the queryID of the input in the response, and a mobyData may contain 
multiple Simple elements only if wrapped by a Collection).

This makes it impossible to distinguish between the situation where one 
query produces multiple unrelated results, versus one or more bags of 
related results. Imagine a "getnonoverlappingsubstrings" service: with 
input abc, it should output [a, bc], [ab, c] and [a, b, c]. Outputting 
[a, bc, ab, c, a, b, c] would not be useful.

More biological example: Dirk has a service that returns sets of 
orthologous genes when given a set of species. For this, he requires 
collections both for input and for output. The ideal situation in my 
view would be:

Input:
Collection of Species

Output:
Many Collections, each Collection contains orthologous genes (one from 
each species). The Collection here defines a set of orthologs, and 
using a collection would be more elegant than having to define a Moby 
object "OrhologSet" has (2...n) Objects

In practice, the Collection tag has been used to indicate when more 
than one Simple occurs, with no "semantic" meaning. This imho is not 
necessary; when more than one Simple occurs, why not put more than one 
Simple? It's easy enough for everyone to figure that out. Then, 
Collection could be used to actually transfer meaning ;-)

This is a drastic change in the way the API is being interpreted, and 
will break code. So it needs calm thinking. But with the "big" API 
change coming up for the Primitives, it could be done. And it would 
make things clearer, also to e.g. the Taverna developers: Getting back 
many Simple articles in response to a query very intuitively indicates 
to continue on with each one individually, whereas getting back a 
Collection indicates to put the whole thing as input into the next 
service, which is what they implemented. Makes perfect sense, as there 
can and will be services that consume Collections.

E.g., the ortholog set case above: Pipe it into a Multiple Alignment 
service, and you get all the alignments for each of the set of 
orthologs. Getting one massive alignment of everything wouldn't make 
sense.

Maybe I've made myself clear, maybe not. Anyway, the Collection issue 
has led to quite some discussions between Rebecca, Dirk and myself, and 
we are all not happy with the way they are currently handled.

Best, Heiko




On 7. Jul 2005, at 15:45 Uhr, Rebecca Ernst wrote:

Hi Eddie, Mark, Martin, Heiko, etc.!

we recently downloaded Taverna 1.2 and found that basically no workflow 
is functional anymore.
After a look into it we found that there was one major change from 
Taverna 1.1 to Taverna 1.2. which is that obviously Taverna 1.2 passes 
collections on to the next services whereas Taverna 1.1 took a 
collection, splitted it into singles and passed the singles to the next 
service.

There are very many services around that give back collections and very 
few that operate on those.
For the moment this means that most of the Biomoby services are not 
compatible to each other even if they give back the same class of 
object!
(e.g. using the service getAGILocusCodes you'll receive a collection of 
AGI Codes and if you pass this on to the service 
getArabidopsisProteinSequences it'll fail as this service takes simple 
inputs only )

As far as we see there would only be one solution for that - we would 
need a new moby widget 'splitCollectionsIntoSingles' which still would 
be 'ugly' as you would also need to know what the first service gives 
back and what the next service needs...
Any other ideas / suggestions?


Best,
Rebecca

-- 
Rebecca Ernst
MIPS, Inst. for Bioinformatics
GSF Research Center for Environment and Health
Ingolstaedter Landstr. 1
85764 Neuherberg
fon: +49 89 3187 3583
email: Rebecca.Ernst at gsf.de




More information about the MOBY-dev mailing list