[MOBY-dev] Re: [MOBY] join operations

Thu Jan 23 17:59:52 UTC 2003

On Thu, 2003-01-23 at 11:15, Ken Steube wrote:

> At first thought it seems ugly, but:
> 
> My idea uses SOAP exactly the way it was intended to be used ...
> collecting info from many servers and combining it into a single coherent
> result.  It's simple to implement, could be very robust and fault
> tolerant, and leaves all kinds of room for speed optimization.

In many ways I agree with you, but I see this scenario to be supra-MOBY,
rather than core MOBY.  When you say, "collecting info from many servers
and combining it", I immediately think "[an agent/client] collecting
info from many [MOBY services] and combining it".

> To achieve any kind of efficiency a join will have to be a server-side
> operation. Since biological databases are generally small our limiting
> factor is network bandwidth.

I don't know that you achieve any speed up if the data you are joining
is not all resident on the one machine... and if it is, then the problem
of data integration isn't a big problem in any case :-)

> A join is perfectly abstractable.  We can write a single join client and
> copy it identically to every MOBY service provider.  Pass the join request
> to the join service on the appropriate server, it fetches both sets of
> data, does the join, and returns to the client or forwards to another
> service for additional operations.

Yup.  I think this is what Phil was also suggesting.  I just wonder how
much advantage is gained by having this at the service end rather than
the client end... In the case you are describing, if the data you are
joining exists on, say, three machines, you first collect the dataset
from the local machine, pass a second dataset over the network and make
the intersection on the local machine, pass the second dataset over the
network and make the intersection on the local machine, then pass the
result of the intersection back to the client.  This puts a potentially
unwanted computational/memory burden on the server... If it is
client-side then your requests are your problem:  you request the three
datasets, and all three are passed over the network, and you do the join
on your own machine.  The difference in network traffic is marginal (3
datasets vs 2+intersection), and the processing power you use is your
own.

I dunno... I'm just not convinced... yet!

> If we want to do MOBY queries in an SQL-like language 

I think the jury is still out on that question too :-)

M

-- 
=======================================================================
                                    |--==\
Mark Wilkinson                       \==-|       1001010010010001001010
Bioinformatics Consultant             \=/        0010010010100101110010
Illuminae Media                       /-\        0010101110110100100101
727 6th Ave. N.                      /-==|       0010100100111101010010
Saskatoon, SK, Canada               |==-/        0101001000100101001011
S7K 2S8                              \=/         0100100100010010010101
+1 (306) 373 3841                     /\         1110101101110101001001
markw at illuminae.com                  /=-\        1101001010100101010101
                                    |--==\
=======================================================================