[MOBY-l] lengthy missive on MOBY status after ISMB/I3C

Mon Aug 19 13:05:04 UTC 2002

>>>>> "Mark" == mwilkinson  <mwilkinson at gene.pbi.nrc.ca> writes:

  Mark> Speaking of licensing issues, I was poking around the UDDI
  Mark> website and it appears that the license that concerned us 6
  Mark> months ago has either been buried somewhere, or it has been
  Mark> changed.  Does anyone know anything about this?  If they have
  Mark> changed the license to be more open, that would be great news!

This would indeed be good news. Most of our registry work has been
UDDI based...

I'm still quite keen to use UDDI based solutions where
possible. Although a registry is simple at first, as soon as you get
to issues of load balancing, and fault tolerance, things get
nastier. These are issues which clear moby will have to address. From
a mygrid perspective I would much rather have someone else, like those
working on UDDI, address. I guess my approach to experimentation, is
use a standard until you know that it is broken, or just will not do
what you want it to. 

  Mark> I met with Carole Gobel at the I3C meeting and we had some
  Mark> deep chats about myGrid & MOBY.  

I presume this means you went to the bar and got drunk?

  Mark> I also spent several hours with Philip Lord and
  Mark> Robert Stevens, who are "in the trenches" of the myGrid
  Mark> project. 

I know this means we went to the bar and got drunk...

  Mark> In myGrid, a service-type description includes a designation
  Mark> of the input and output for that service type.  In MOBY,
  Mark> inputs and outputs are associated with a service *instance*
  Mark> rather than a service *type*.

  Mark> In Mygrid, a Blast service would pretty much be defined that
  Mark> way while in MOBY we could define create a Blast service which
  Mark> takes TAIR/Locus id's as input and gives back GenBank/GI's as
  Mark> output.  Clearly, under the hood, this service is somehow
  Mark> retrieving the sequence associated with the TAIR/Locus then
  Mark> blasting it, and parsing the output to get the GI numbers.
  Mark> The question, I think, is open: how "dangerous" is it
  Mark> w.r.t. automated data discovery, to have such flexible service
  Mark> specifications?  I'm not convinced that we are correct in
  Mark> defining our services so loosely, but on the other hand, it
  Mark> does allow us to have much "cooler" joins of data...  I'd
  Mark> actually like to have a good thorough hashing out of this
  Mark> topic on the list so that we explore all possibilities.  It
  Mark> may be that we just have to build it and see, but if we can
  Mark> see potential problems right up front we should deal with
  Mark> them.

I don't think its dangerous. It's just that the looser your service
descriptions, the less you can expect the computer to do for you. If
your ontology, for instance, does not distinguish between a Blast
service which takes a sequence, and one which takes an ID, you could
give the wrong stuff to the wrong service. Or you could give an ID to
the wrong database, such as giving an SWISS-PROT id to a TAIR only
search facility. 

I think this is a slightly separate issue from where the data
stored. To me it makes sense to have all the metadata about a service
in one place, rather than have some of it in one place, and some in
another. With your system, you would have to ask both the class
directory to get information about BLAST services, and then the
instance repository to get information about the instances, and which
ones could cope with TAIR ID's, which with sequences, and so on. 

This, of course, gets back to the gritty question of the relationship
between the metadata and the instance directory. Should they be one,
or two. 

At heart I do not think our approaches are that different. If you
imagine a simple system where, for instance, you only have one service
of each type, the distinction between the class and the instance
becomes fairly moot, and our two systems become much the same. 

  James> I believe that our way better capture's the task at hand -
  James> here's a data type I've got, what can I do with it? Unless
  James> the types are broad, how are biologist's supposed to
  James> understand the nuances between 30 different inter-related
  James> BLAST-types? But if they ask for a blast service, and they
  James> have a list to choose from, we should be giving them a way to
  James> learn what output they'll be getting back.

If you have loosely defined datatypes then the system will tell you
what you can do with it, and include things that, in fact, you can
not. Like for instance giving a SW ID, to a TAIR blast search. 

As for the biologist understanding the inter-related BLAST types, they
don't have to. The reason for having a complex and well specified
ontology is so that the computer can work all this stuff out. How you
present it too a biologist is a different issue. Presenting an
abstraction, or a view, over a more complex data structure is much
easier than trying to get the computer to infer a more complex data
from a simpler one. 

There is a real issue here, that if we describe our services in terms
of DAML+OIL, then the descriptions become complex to produce. Talking
to Mark, I agree with him that moby, and mygrid have got to make it
easy for service providers. Equally if it is too simple, then the
metadata will not be able to do much for biologists, because the
descriptions will not be rich enough for a computer to work with. How
we square this circle is unclear, and something that I think only
time, and experimentation will tell. 

  Mark> We haven't actually specified what a service should do in this
  Mark> case (and we probably should...soon!), but what struck me was
  Mark> a related scenario where you send a list of objects to a
  Mark> service, and it returns a list of response objects... but
  Mark> there is no way to correlate which input object resulted in
  Mark> which output object!  What an awful oversight..

Perhaps I have misunderstood the problem here. 

Could you not have an "exception" message, which would come back with
a query, which said "these two bits of the query failed". This would
allow you to tell which has failed and which has not. Or is there a
more generic problem that you haven't worked out how to link parts of
a "multi-part" query, to parts of a multi-part answer, whether that
answer is a result, or a failure report?

Cheers

Phil

-- 
Phillip Lord,				Phone: +44 (0) 161 275 6138
PostDoctoral Research Associate,        Email: p.lord at russet.org.uk
Department of Computer Science          http://www.russet.org.uk
University of Manchester                
Oxford Road
Manchester
M13 9PL