[MOBY-l] further thoughts on atomic services & service description

Tue Nov 5 15:35:06 UTC 2002

Hi all,

Another "head on a page" message - sorry!

The squatting octopus has released one of his tentacles and I am 
starting to see a possible way forward with respect to service 
descriptions that maintains the "central dogma" of MOBY (input > 
transform > output).  I think we still want to avoid limiting any of 
those three parameters if possible because this flexibility allows 
services to assist in the retrieval of 'tangential' information, but we 
need to move toward unambiguously defining the transformation that is 
occuring.

I have been harping on and on about 'atomic' services, and I'm sure you 
are all sick to death of that (except for Martin, whom I still have to 
convince ;-)  ;-)).  Well, too bad, here is some more harping!

Some service signatures strike me as unambiguous:

GenBank/GI > retrieve > GenBank/Record
Sequence > BLAST > Blast/Report

However, we freely allow registration of some real wacko services such as:

PubMed/ID > retrieve > Sequence

...??? huh??  what does "retrieve" mean in that service??  Our human 
readable description might be "retrieves all squences first published in 
that manuscript".  Okay, fine, but there is no way to make sense of that 
service signature without human intervention.

One of the use cases we have been examining in our (Lincoln, Damian, and 
Co.) conference calls is retrieving annotations corresponding to a GO 
term.  Even this simple case is a headache to describe in the current 
MOBY world:

let's say we register the service signature

GO/Term > Retrieve > GenBank/GI

This seems trivial, and in fact, Lukas has already deployed a service at 
TAIR that does exactly this, so we know that we can already handle such 
cases in reality.  Still, I don't think the service description is quite 
so straightforward as it seems... My understanding of GO tells me that 
any gene annotated to a child of any given node, is correctly described 
as being annotated to that parent node also.  Thus what are we 
"retrieving" - all genes annotated to that node, or all genes annotated 
to that node and its children?  The service signature, as it stands, 
doesn't tell us this.

So... what happens if we break it down to two 'atomic' services:

1) GO/Term > Retrieve > Graph
2a) Graph > Retrieve > GenBank/GI

It is now starting to become less ambiguous what each of these services 
does... we could still argue that the first service is ambiguous because 
you have not specified whether you want the path to the parent in the 
graph, or the graph below the given node... but I think we are getting 
close.

So, I'm left wondering - are there "primitives" in service description, 
where every existing service can be conceptually broken down into a 
series of steps each of which is adequately described by one of these 
primitives?  If so, then we have two possibilities that are in keeping 
with the current central dogma of MOBY;  either we insist that all 
registered services be of a primitive type and these can be strung 
together at the client end to do useful complex tasks, or we change the 
central registry such that complex services register a "path" from input 
to output object types as the "transform" part of their service 
signature.  I think the answer depends on how "thin" we want the MOBY 
discovery/transport layer to be, and how much pain we want to put 
service providers through...

I have to run!  please comment/flame/criticize at will :-)

Rushing...

M

-- 
--------------------------------
"Speed is subsittute fo accurancy."
________________________________

Dr. Mark Wilkinson, RA Bioinformatics
National Research Council, Plant Biotechnology Institute
110 Gymnasium Place, Saskatoon, SK, Canada

phone : (306) 975 5279
pager : (306) 934 2322
mobile: markw_mobile at illuminae dot com