[MOBY-dev] Re: [MOBY] join operations

Thu Jan 23 19:48:51 UTC 2003

Hello everyone,

see comments inline..

>>>>>> "Mark" == Mark Wilkinson <markw at illuminae.com> writes:
>
>  Mark> Hi Ken,
>
>  Mark> A bit of a rambling response... as it is 10:00Pm on my last
>  Mark> day of work (Phil L. knows exactly what this means ;-) )
>
>  Mark> I'm c.c'ing this response to the moby-dev list because I think
>  Mark> it jarrs a lot of nerves that need to be jarred...
>
>  Mark> I have a big smile on my face right now... not because I have
>  Mark> an answer, but because I *don't* have an answer and wish I did
>  Mark> (or more importantly, insist that the final MOBY spec does!).;
>
>  Mark> At the moment MOBY handles *only* queries of the type:
>
>  Mark> (discover, and) select n from foreignservicen where value=x
>
>  Mark> I can't state strongly enough how crappy MOBY is at solving
>  Mark> any more complex problem than that!
>
>
>There are two basic problems that Ken seems to be raising. First, if I
>want to combine the results of two services, that is perform a join
>operation, then how do I do this? In generally this should not be too
>hard. All you need is some method for checking equality of data
>returned, which in this case is proteins, or more likely protein
>identifiers.
>
>The second problem, which is much more complex, is how to do this
>efficiently. There are various speed ups that can be performed (like
>doing the join on the remote machine, or more generally ensuring that
>the query is performed so that the smallest amount of  data transfer
>takes place.
>
>We do have people working on this, within mygrid, and I've cc: this
>reply to at least one of them!
>

Yes, we are working on a service-based distributed query processor (DQP),
which, along with integrating distributed data sources, it can also
integrate data from a call to a web (or grid) service. The DQP relies on
Open Grid Services Architecture (OGSA) and OGSA-DataAccess and Integration
(OGSA-DAI) framework as the underlying platforms. The prototype is due for
release in July 2003. We are aiming to handle much more complex queries than
you (Mark)describe above. If you are interested in more detail you can have
a look our design document in MyGrid Twiki at the following url:

http://phoebus.cs.man.ac.uk/twiki/pub/Mygrid/DistributedQueryProcessor/OGSA-
DAi_GDQSDesignv0.2.pdf

Nedim.

>
>  Mark> This leads to the bane of my life right now... MOBY is
>  Mark> wonderful at solving the most basic (and common) search and
>  Mark> retrieve problem;
>
>This is a good thing to achieved. Even if moby never does anything
>more than solving the basic problems, it will make life significantly
>better!
>
>
>
>
>  Mark> MOBY services (today) have the signature
>  Mark> INPUT+TRANSFORM+OUTPUT, and this is supposed to be sufficient
>  Mark> for a client to identify the desired service... but it is
>  Mark> obviously not.  Your desired service is a perfect example of
>  Mark> that!  How do you describe the "transform" that you are
>  Mark> making?!?
>
>  Mark> I think, in parallel with the fantastic work that Damian and
>  Mark> Andrew are doing v.v service description and data transport
>  Mark> technology, we need to spend energy thinking deeply about
>  Mark> service type description - I see this as a critical problem
>  Mark> (and I think that myGrid has a lot to teach us about this!!).
>
>
>
>Well, I hope that we can do something to help. I'm currently working
>on service discovery, using the service ontology that Chris Wroe
>wrote. For an initial demonstrator we will have a very simple query
>builder (similar to the complex_client example from biomoby). It
>should be able to do "give me all services which take a sequence as an
>input, but because its using DAML+OIL underneath, it becomes simple to
>teach it that, for instance, a Swissprot record is a source of protein
>sequences, but, also a lot of annotation, with for instance MEDLINE
>identifiers included.
>
>
>I'm hoping that I can turn this into a proper demonstrator within a
>month or two. Unlike biomoby, the "semantic discovery" will be split
>from the registration process. The paradigm here is a bit like
>google....you "register" in one way (for the web, just by sticking it
>on the web), and then you search in another. This makes the system
>more extensible, as you might also want to search in other ways. So
>you might want to say "give me all services taking sequences, and
>which have been working for 90% of the last week, and have been set up
>on our local site", and so on.
>
>The other point in doing this, is that if a search service, is just a
>normal registered service, then the issues of querying across several
>services efficiently, becomes the same issue discussed at the start of
>the email!
>
>It might be quite good to discuss these issues sometime. I'm reticent
>to attend the telecon regularly (its a bit late at night for me, and
>I'd have to do it from home, which adds the complexity of claiming
>back the costs), but I can attend sometimes, if there is a good
>reason! Perhaps in a month or so's time, a service
>discovery/registration telecon would be a good idea.
>
>
>
>Cheers
>
>
>Phil