[MOBY-l] further thoughts on atomic services & service description
Mark Wilkinson
mwilkinson at gene.pbi.nrc.ca
Tue Nov 5 15:35:06 UTC 2002
Hi all,
Another "head on a page" message - sorry!
The squatting octopus has released one of his tentacles and I am
starting to see a possible way forward with respect to service
descriptions that maintains the "central dogma" of MOBY (input >
transform > output). I think we still want to avoid limiting any of
those three parameters if possible because this flexibility allows
services to assist in the retrieval of 'tangential' information, but we
need to move toward unambiguously defining the transformation that is
occuring.
I have been harping on and on about 'atomic' services, and I'm sure you
are all sick to death of that (except for Martin, whom I still have to
convince ;-) ;-)). Well, too bad, here is some more harping!
Some service signatures strike me as unambiguous:
GenBank/GI > retrieve > GenBank/Record
Sequence > BLAST > Blast/Report
However, we freely allow registration of some real wacko services such as:
PubMed/ID > retrieve > Sequence
...??? huh?? what does "retrieve" mean in that service?? Our human
readable description might be "retrieves all squences first published in
that manuscript". Okay, fine, but there is no way to make sense of that
service signature without human intervention.
One of the use cases we have been examining in our (Lincoln, Damian, and
Co.) conference calls is retrieving annotations corresponding to a GO
term. Even this simple case is a headache to describe in the current
MOBY world:
let's say we register the service signature
GO/Term > Retrieve > GenBank/GI
This seems trivial, and in fact, Lukas has already deployed a service at
TAIR that does exactly this, so we know that we can already handle such
cases in reality. Still, I don't think the service description is quite
so straightforward as it seems... My understanding of GO tells me that
any gene annotated to a child of any given node, is correctly described
as being annotated to that parent node also. Thus what are we
"retrieving" - all genes annotated to that node, or all genes annotated
to that node and its children? The service signature, as it stands,
doesn't tell us this.
So... what happens if we break it down to two 'atomic' services:
1) GO/Term > Retrieve > Graph
2a) Graph > Retrieve > GenBank/GI
It is now starting to become less ambiguous what each of these services
does... we could still argue that the first service is ambiguous because
you have not specified whether you want the path to the parent in the
graph, or the graph below the given node... but I think we are getting
close.
So, I'm left wondering - are there "primitives" in service description,
where every existing service can be conceptually broken down into a
series of steps each of which is adequately described by one of these
primitives? If so, then we have two possibilities that are in keeping
with the current central dogma of MOBY; either we insist that all
registered services be of a primitive type and these can be strung
together at the client end to do useful complex tasks, or we change the
central registry such that complex services register a "path" from input
to output object types as the "transform" part of their service
signature. I think the answer depends on how "thin" we want the MOBY
discovery/transport layer to be, and how much pain we want to put
service providers through...
I have to run! please comment/flame/criticize at will :-)
Rushing...
M
--
--------------------------------
"Speed is subsittute fo accurancy."
________________________________
Dr. Mark Wilkinson, RA Bioinformatics
National Research Council, Plant Biotechnology Institute
110 Gymnasium Place, Saskatoon, SK, Canada
phone : (306) 975 5279
pager : (306) 934 2322
mobile: markw_mobile at illuminae dot com
More information about the moby-l
mailing list