[MOBY-l] lengthy missive on MOBY status after ISMB/I3C

Thu Aug 15 16:30:46 UTC 2002

Hi everyone,

Here's a first-pass update on the status of the MOBY project after
meeting with the I3C and chatting with the BOSC/ISMB attendees:

I3C were highly supportive of what we are doing, and several members
expressed interest in becoming involved.  Brian Gilman and I had a
serious discussion with them about the meaning of "open source", and
they appear to be moving in this direction.  An I3C member from IBM is
going to try re-coding MOBY-Central to run on UDDI - this will be
interesting to know if that works, though I am reluctant to buy-in
wholly to UDDI right away both because of the licensing issues as well
as the desire to have the "freedom" for experimentation that a
home-grown registry gives us at this point.

Speaking of licensing issues, I was poking around the UDDI website and
it appears that the license that concerned us 6 months ago has either
been buried somewhere, or it has been changed.  Does anyone know
anything about this?  If they have changed the license to be more open,
that would be great news!  Please contact me if you have any information
about this...  Brian, can you ask your I3C colleagues if they know
what's up - I can't remember who that fellow was who spoke-up at the I3C
meeting agreeing with me that, although the original UDDI spec was open,
it was no longer the case.

Brian Gilman is lending his software-engineering expertise to
MOBY Central and porting it to Java while fixing some of its nonsensical
quirks.

I met with Carole Gobel at the I3C meeting and we had some deep chats
about myGrid & MOBY.  We have agreed to work closely together, and it
was exciting to discuss some of the more worrying aspects of the data
discovery problem with someone who's "been there and done that" :-)   I
also spent several hours with Philip Lord and Robert Stevens, who are
"in the trenches" of the myGrid project.  We hashed out what I think is
the critical difference between the mygrid and MOBY approaches to data
discovery:

In myGrid, a service-type description includes a designation of the
input and output for that service type.  In MOBY, inputs and outputs are
associated with a service *instance* rather than a service *type*.

This seems to be a "fiddly" difference, but it does lead to some
interesting discussions :-)   For example, a Blast service obviously
requires a sequence as input, gives a blast report as output, and
requires a compiled sequence database to search against (myGrid includes
the dependancies of a service in its service description... something
that we completely ignore).  In Mygrid, a Blast service would pretty
much be defined that way  while in MOBY we could define create a Blast
service which takes TAIR/Locus id's as input and gives back GenBank/GI's
as output.  Clearly, under the hood, this service is somehow retrieving
the sequence associated with the TAIR/Locus then blasting it, and
parsing the output to get the GI numbers.  The question, I think, is
open:   how "dangerous" is it w.r.t. automated data discovery, to have
such flexible service specifications?  I'm not convinced that we are
correct in defining our services so loosely, but on the other hand, it
does allow us to have much "cooler" joins of data...  I'd actually like
to have a good thorough hashing out of this topic on the list so that we
explore all possibilities.  It may be that we just have to build it and
see, but if we can see potential problems right up front we should deal
with them.

A concern I have had for quite a while crystallized after the BOSC talk,
prompted by a question from Lincoln.  When a service registers, it may
register itself as dealing with a certain namespace, however it doesn't
have to promise to deal with all instances in that namespace.  As such,
you might send data to that service that it can't deal with, even though
your data appears to fulfill all of the service requirements.  We
haven't actually specified what a service should do in this case (and we
probably should...soon!), but what struck me was a related scenario
where you send a list of objects to a service, and it returns a list of
response objects... but there is no way to correlate which input object
resulted in which output object!  What an awful oversight...  I see two
ways of dealing with this:

1)  A special base-object type like <Query namespace=My/NS  id=12345/>
that contains the namespace and ID of the input object and appears as a
Cross-reference in the CrossReference block, or
2)  A new XML block "Query" containing the same info.

I am in favour of option (1)... just because it makes sense to me.
Clearly, an input object is a cross-reference for the output object, so
it "fits" in that section of the response, and presumably the namespace
and id of the input object is sufficient to identify that object
uniquely among a list of inputs (the input object type/instance is
irrelevant in this case, I think...) so we don't need to know more about
it.  This would at least allow a client to keep track of which data has
been "dealt with" when it gets a response from a service.  That allows
for the following scenario:  A client has a list of input objects, and
tries several instances of a particular service type trying to deal with
them.  It can send all inputs to the first service, figure out which
ones were dealt with and which were not, and send the remaining to
another instance of the same service type, and another, and another,
until all objects have been dealt with.

There are other issues to discuss, but I think this is a lot to chew on
for the moment.

I hope we can have a lively discussion of these issues here on the list
before we start coding!

Cheers all!

M

--
--------------------------------
"Speed is subsittute fo accurancy."
________________________________

Dr. Mark Wilkinson, RA Bioinformatics
National Research Council, Plant Biotechnology Institute
110 Gymnasium Place, Saskatoon, SK, Canada

phone : (306) 975 5279
pager : (306) 934 2322
mobile: markw_mobile at illuminae.com