[MOBY-dev] Re: [MOBY] join operations

Phillip Lord p.lord at russet.org.uk
Thu Jan 23 13:20:18 UTC 2003


>>>>> "Mark" == Mark Wilkinson <markw at illuminae.com> writes:

  Mark> Hi Ken,

  Mark> A bit of a rambling response... as it is 10:00Pm on my last
  Mark> day of work (Phil L. knows exactly what this means ;-) )

  Mark> I'm c.c'ing this response to the moby-dev list because I think
  Mark> it jarrs a lot of nerves that need to be jarred...

  Mark> I have a big smile on my face right now... not because I have
  Mark> an answer, but because I *don't* have an answer and wish I did
  Mark> (or more importantly, insist that the final MOBY spec does!).;

  Mark> At the moment MOBY handles *only* queries of the type:

  Mark> (discover, and) select n from foreignservicen where value=x

  Mark> I can't state strongly enough how crappy MOBY is at solving
  Mark> any more complex problem than that!


There are two basic problems that Ken seems to be raising. First, if I
want to combine the results of two services, that is perform a join
operation, then how do I do this? In generally this should not be too
hard. All you need is some method for checking equality of data
returned, which in this case is proteins, or more likely protein
identifiers. 

The second problem, which is much more complex, is how to do this
efficiently. There are various speed ups that can be performed (like
doing the join on the remote machine, or more generally ensuring that
the query is performed so that the smallest amount of  data transfer
takes place. 

We do have people working on this, within mygrid, and I've cc: this
reply to at least one of them!


  Mark> This leads to the bane of my life right now... MOBY is
  Mark> wonderful at solving the most basic (and common) search and
  Mark> retrieve problem; 

This is a good thing to achieved. Even if moby never does anything
more than solving the basic problems, it will make life significantly
better!




  Mark> MOBY services (today) have the signature
  Mark> INPUT+TRANSFORM+OUTPUT, and this is supposed to be sufficient
  Mark> for a client to identify the desired service... but it is
  Mark> obviously not.  Your desired service is a perfect example of
  Mark> that!  How do you describe the "transform" that you are
  Mark> making?!?

  Mark> I think, in parallel with the fantastic work that Damian and
  Mark> Andrew are doing v.v service description and data transport
  Mark> technology, we need to spend energy thinking deeply about
  Mark> service type description - I see this as a critical problem
  Mark> (and I think that myGrid has a lot to teach us about this!!).



Well, I hope that we can do something to help. I'm currently working
on service discovery, using the service ontology that Chris Wroe
wrote. For an initial demonstrator we will have a very simple query
builder (similar to the complex_client example from biomoby). It
should be able to do "give me all services which take a sequence as an
input, but because its using DAML+OIL underneath, it becomes simple to
teach it that, for instance, a Swissprot record is a source of protein
sequences, but, also a lot of annotation, with for instance MEDLINE
identifiers included. 


I'm hoping that I can turn this into a proper demonstrator within a
month or two. Unlike biomoby, the "semantic discovery" will be split
from the registration process. The paradigm here is a bit like
google....you "register" in one way (for the web, just by sticking it
on the web), and then you search in another. This makes the system
more extensible, as you might also want to search in other ways. So
you might want to say "give me all services taking sequences, and
which have been working for 90% of the last week, and have been set up
on our local site", and so on. 

The other point in doing this, is that if a search service, is just a
normal registered service, then the issues of querying across several
services efficiently, becomes the same issue discussed at the start of
the email!

It might be quite good to discuss these issues sometime. I'm reticent
to attend the telecon regularly (its a bit late at night for me, and
I'd have to do it from home, which adds the complexity of claiming
back the costs), but I can attend sometimes, if there is a good
reason! Perhaps in a month or so's time, a service
discovery/registration telecon would be a good idea.



Cheers


Phil



More information about the MOBY-dev mailing list