[MOBY-l] Genomic position-based GO search...

Mark Wilkinson mwilkinson at gene.pbi.nrc.ca
Thu Nov 21 19:46:59 UTC 2002


Hi Simon!

You should see the huge smile on my face right now :-)

This is **precisely** the type of problem that we are hoping to solve in 
the BioMOBY project (you should submit it to our UseCase repository, or 
send it to Damian).   In fact, I think I can imagine a service path that 
could be implemented under the existing MOBY architechture that would 
solve exactly this query!

The "issue", quite frankly, is buy-in from the service providers; we 
can't *force* anyone to set up MOBY services (though we can set up MOBY 
screen-scraper services to accomplish a few things...)  This raises, 
again, a point that I think is worth resolving by a discussion in the 
wider MOD /MOBY community, since they are the ones that will have to 
invest the time at the end of the day:

Ken Stuebe wrote a message yesterday (to the moby-guts list) indicating 
his interest in setting up a service, but also expressing concern that 
the MOBY spec is currently a moving target.  He is absolutely right!  We 
are still making some fairly fundamental design decisions.  However, 
between TAIR, BDGP and my own lab we have set up some fairly cool 
services that seem to work like pollywogs using the existing spec. 
They are far from being 100% machine readable - in fact, they 
*absolutely require* human interpretation of the transformation that is 
taking place in the service - but nevertheless it seems that we already 
have at least a quick-n-dirty solution to the immediate interoperability 
problem that does, in fact, work to a reasonable standard.

So... the question is, is it worth the effort to deploy a wider range of 
MOBY services using the existing spec (i.e. freeze this version of the 
spec) in the full knowledge that the spec will evolve, possibly 
significantly, over time and that we will likely have to re-deploy all 
of these services at some point in the future?  In essense, how 
immediate and serious is the interoperabilily problem TO YOU?

Now, shame on me for not writing **much** more clear and detailed 
versions of the existing spec - my bad!   I've been waffling on this 
precisely because of the "moving target" phenomenon... but lets assume 
that I could do this in a week or two, to the point where others would 
be able to set up services on their own and be fairly sure that they are 
"following the rules" as they exist today.  Would the MOD's & other 
interested players be willing to step up to the plate at this early 
stage and start implementing the spec as it stands, in all its glory (or 
'gory')

On a personal note: one useful side effect of doing this is that we get 
a *much* clearer picture of the behaviour of the MOBY system as it 
exists today.  Having only a dozen or so services is not effectively 
telling me (us) whether we are on the right track or completely out to 
lunch... *any* system can work if it only has to do 12 things...

I think it's clear where I stand on the issue ;-)  but it isn't me that 
has to bear the brunt of the effort, so I'd like opinions from the rest 
of you.  Are we solving a problem, or just causing ourselves more 
trouble in the future, by moving forward enthusiastically *right now*?

Can I have a show of hands from those who would be interested in 
"playing the game" (rolling the dice) right now, future spec be damned?

C'mon!  It'll be fun!   ... and useful!

Mark




Simon Twigger wrote:
> Hi there,
> 
> (Apologies for the GO/BioMOBY cross post but I think its relevant to  both)
> 
> As part of our ongoing work to implement an ontology schema within RGD  
> we were doing some use case analyses and one of the big things I think  
> our users (Rat geneticists/genomic people, positional cloners, etc)  
> want to do is to find out what genes are in their region of interest  
> (defined by a QTL, syntenic region, or similar) and from that, the GO  
> terms associated with those genes. The next step would be to build in  
> some sort of filter that allowed them to ask "What genes in this region  
> are part of XX process/component/function?" etc. Im sure this is  
> something that isnt restricted to Rat genomics.
> 
> I know that this isnt too hard to build for each individual db as they  
> have the gene information, the mapping information and the GO  
> information locally and its all integrated. However, when you are doing  
> comparative analyses (at least in our case) you know the syntenic  
> region you are interested in but you dont have all the  
> genes/positions/terms for the other organism(s) in your own database so  
> you cant easily offer that functionality. You might not want to bump  
> the user off to the other organism db to use their interface (if it  
> exists) and this also wouldnt work if you want this functionality  
> inside a tool rather than as a user-operated search function.
> 
> Im wondering if this is functionality that could be provided either by  
> GO, or by a db for their own organism, that others could use thereby  
> saving others the hassle of maintaining lots of info about the other  
> organisms genes and locations?
> 
> Two potential solutions that I thought of:
> Option 1 - add the mapping information into GO: add chromosome and  
> genomic location (and presumably build/reference map info). If you then  
> know the region you are interested in from the other organism you can  
> get all genes with GO terms of interest between START and STOP on  
> chromosome N. Downsides to this are adding yet more info to GO files  
> and schema, the hassle of keeping things up to date at GO, etc. and it  
> might not be worth the pain.
> 
> Option 2 - (I like this one) Have a standard API offered by a db  
> (webservices/BioMoby would seem to be a good fit here) that others can  
> call to extract this information: You pass in the chromosome, the map  
> and the region and optionally some GO terms that you want to use to  
> refine the returned results and the webservice returns a list of genes  
> in that region that match those criteria. On the MOBY front - Im not  
> sure if this is violating the atomic input > transform > output concept  
> by doing too much in the transform step and it could certainly be  
> broken down into component parts and joined back together.
> 
> What do others think about this? Ultimately I'd love to see this  
> genomic position based search expanded so I could pop a genome browser  
> on top to display not every gene/feature/SNP etc. in a region but only  
> those that match certain criteria - a genome-based search engine for  
> the db.
> 
> 
> Simon.
> 
> 
> ------------------------------------------------------------------------ 
> --------------------------
> Simon Twigger, Ph.D.
> Assistant Professor, Bioinformatics Research Center
> 
> Medical College of Wisconsin
> 8701 Watertown Plank Road,
> Milwaukee, WI, 53226
> tel. 414-456-8802, fax 414-456-6595
> 
> _______________________________________________
> moby-l mailing list
> moby-l at biomoby.org
> http://biomoby.org/mailman/listinfo/moby-l
> 


-- 
--------------------------------
"Speed is subsittute fo accurancy."
________________________________

Dr. Mark Wilkinson, RA Bioinformatics
National Research Council, Plant Biotechnology Institute
110 Gymnasium Place, Saskatoon, SK, Canada

phone : (306) 975 5279
pager : (306) 934 2322
mobile: markw_mobile at illuminae dot com





More information about the moby-l mailing list