[MOBY-dev] Re: Problems with Biomoby services in Taverna 1.2
Heiko Schoof
schoof at mpiz-koeln.mpg.de
Thu Jul 7 15:34:39 UTC 2005
Please let's take a minute and review the use of collections in
BioMoby, before we go off to create "workarounds"...
I myself am confused about the use of collections. Originally I had in
my mind that Collections were a construct to allow objects that
inherently belong together to be "bagged". Example: A multiple
alignment program that takes any number of sequences as input.
Example2: A keyword search that takes any number of keywords and then
does a query combining all of them. As opposed to: Inputing a list of
keywords and executing the keyword search for each of them, which would
require a separate moby:mobyData and queryID for each of the keywords.
The confusion starts with the output of services. My understanding was
that ONLY a service that is guaranteed to output exactly one object for
each query (e.g. an averaging service that outputs the average of a
list of inputs) is registered as outputting a Simple, all others have
to output collections (as there must be exactly one mobyData matching
the queryID of the input in the response, and a mobyData may contain
multiple Simple elements only if wrapped by a Collection).
This makes it impossible to distinguish between the situation where one
query produces multiple unrelated results, versus one or more bags of
related results. Imagine a "getnonoverlappingsubstrings" service: with
input abc, it should output [a, bc], [ab, c] and [a, b, c]. Outputting
[a, bc, ab, c, a, b, c] would not be useful.
More biological example: Dirk has a service that returns sets of
orthologous genes when given a set of species. For this, he requires
collections both for input and for output. The ideal situation in my
view would be:
Input:
Collection of Species
Output:
Many Collections, each Collection contains orthologous genes (one from
each species). The Collection here defines a set of orthologs, and
using a collection would be more elegant than having to define a Moby
object "OrhologSet" has (2...n) Objects
In practice, the Collection tag has been used to indicate when more
than one Simple occurs, with no "semantic" meaning. This imho is not
necessary; when more than one Simple occurs, why not put more than one
Simple? It's easy enough for everyone to figure that out. Then,
Collection could be used to actually transfer meaning ;-)
This is a drastic change in the way the API is being interpreted, and
will break code. So it needs calm thinking. But with the "big" API
change coming up for the Primitives, it could be done. And it would
make things clearer, also to e.g. the Taverna developers: Getting back
many Simple articles in response to a query very intuitively indicates
to continue on with each one individually, whereas getting back a
Collection indicates to put the whole thing as input into the next
service, which is what they implemented. Makes perfect sense, as there
can and will be services that consume Collections.
E.g., the ortholog set case above: Pipe it into a Multiple Alignment
service, and you get all the alignments for each of the set of
orthologs. Getting one massive alignment of everything wouldn't make
sense.
Maybe I've made myself clear, maybe not. Anyway, the Collection issue
has led to quite some discussions between Rebecca, Dirk and myself, and
we are all not happy with the way they are currently handled.
Best, Heiko
On 7. Jul 2005, at 15:45 Uhr, Rebecca Ernst wrote:
Hi Eddie, Mark, Martin, Heiko, etc.!
we recently downloaded Taverna 1.2 and found that basically no workflow
is functional anymore.
After a look into it we found that there was one major change from
Taverna 1.1 to Taverna 1.2. which is that obviously Taverna 1.2 passes
collections on to the next services whereas Taverna 1.1 took a
collection, splitted it into singles and passed the singles to the next
service.
There are very many services around that give back collections and very
few that operate on those.
For the moment this means that most of the Biomoby services are not
compatible to each other even if they give back the same class of
object!
(e.g. using the service getAGILocusCodes you'll receive a collection of
AGI Codes and if you pass this on to the service
getArabidopsisProteinSequences it'll fail as this service takes simple
inputs only )
As far as we see there would only be one solution for that - we would
need a new moby widget 'splitCollectionsIntoSingles' which still would
be 'ugly' as you would also need to know what the first service gives
back and what the next service needs...
Any other ideas / suggestions?
Best,
Rebecca
--
Rebecca Ernst
MIPS, Inst. for Bioinformatics
GSF Research Center for Environment and Health
Ingolstaedter Landstr. 1
85764 Neuherberg
fon: +49 89 3187 3583
email: Rebecca.Ernst at gsf.de
More information about the MOBY-dev
mailing list