[MOBY-l] Re: Genomic position-based GO search...

Fri Nov 22 14:31:27 UTC 2002

If you check out the genes and markers query form at MGI, you can see how 
we have approached this for the time being. Using this query form, one can 
search for a GO term and a map location.
You can also use the comparative maps part of the database to perform some 
cross-species stuff, at least for the orthology relationships that we capture.
Check it out.

David

At 04:28 PM 11/21/2002 -0600, Simon Twigger wrote:

>Hi Chris
>
>Your idea for #1 is intriguing and probably raises many questions in

>and of itself.  Certainly the nice feature about GO (eg. Amigo) is that
>you can run cross-organism queries in one place and this integration
>using the GO IDs as the primary key works very well. I guess I
>ultimately see the various databases we run as essentially being tables
>in some uber-database,  connected together by our common primary keys
>(accession numbers), the stumbling block currently being how to run the
>queries (we dont know whats in the tables and what the keys between the
>tables are). The question becomes how to achieve something like that -
>by having everything in one place and forced into a common structure to
>define the tables and keys or by distributing the components and having
>them connected by common protocols providing a level of abstraction
>between the query and the underlying structure.
>
>Im not sure I fully understand what you mean in #2  that the approach
>doesn't scale - do you mean that we'd be trying to do too much in the
>'transform' step of the process and we'd end up writing lots of
>different APIs to handle slight differences in the query? Perhaps you
>could expand on your experience on the GO API as this might help others
>understand the practical limitations of these things based on your
>experience? Naively I cant see any reason why such a service couldnt be
>written but I dont have practical experience suggesting why it would be
>a bad thing in practice.
>
>
>Simon.
>
>On Thursday, Nov 21, 2002, at 16:46 America/Chicago, Chris Mungall
>wrote:
>
>>Hi Simon
>>
>>Regarding (1) - I think it would be extremely useful to have a central
>>repository where all the gmods contribute various data files to allow
>>for easier building of warehouse dbs for performing the kind of queries
>>you mention. This is way outside the score of GO, but it seems we have
>>all
>>the right people listening here, so maybe some kind of spin-off.
>>
>>The problem with (2) is that the API approach doesn't scale to doing
>>complex queries - the GO API I wrote being a good example of this. APIs
>>are useful for restricted queries but often you need to step outside
>>the
>>restrictions of the API and for this you need a language, not method
>>calls.
>>
>>Of course, they're not incompatible, (2) could be used to build (1)
>>which
>>combines the best of both distributed and centralised approaches.
>>
>>On Thu, 21 Nov 2002, Simon Twigger wrote:
>>
>>>Hi there,
>>>
>>>(Apologies for the GO/BioMOBY cross post but I think its relevant to
>>>both)
>>>
>>>As part of our ongoing work to implement an ontology schema within RGD
>>>we were doing some use case analyses and one of the big things I think
>>>our users (Rat geneticists/genomic people, positional cloners, etc)
>>>want to do is to find out what genes are in their region of interest
>>>(defined by a QTL, syntenic region, or similar) and from that, the GO
>>>terms associated with those genes. The next step would be to build in
>>>some sort of filter that allowed them to ask "What genes in this
>>>region
>>>are part of XX process/component/function?" etc. Im sure this is
>>>something that isnt restricted to Rat genomics.
>>>
>>>I know that this isnt too hard to build for each individual db as they
>>>have the gene information, the mapping information and the GO
>>>information locally and its all integrated. However, when you are
>>>doing
>>>comparative analyses (at least in our case) you know the syntenic
>>>region you are interested in but you dont have all the
>>>genes/positions/terms for the other organism(s) in your own database
>>>so
>>>you cant easily offer that functionality. You might not want to bump
>>>the user off to the other organism db to use their interface (if it
>>>exists) and this also wouldnt work if you want this functionality
>>>inside a tool rather than as a user-operated search function.
>>>
>>>Im wondering if this is functionality that could be provided either by
>>>GO, or by a db for their own organism, that others could use thereby
>>>saving others the hassle of maintaining lots of info about the other
>>>organisms genes and locations?
>>>
>>>Two potential solutions that I thought of:
>>>Option 1 - add the mapping information into GO: add chromosome and
>>>genomic location (and presumably build/reference map info). If you
>>>then
>>>know the region you are interested in from the other organism you can
>>>get all genes with GO terms of interest between START and STOP on
>>>chromosome N. Downsides to this are adding yet more info to GO files
>>>and schema, the hassle of keeping things up to date at GO, etc. and it
>>>might not be worth the pain.
>>>
>>>Option 2 - (I like this one) Have a standard API offered by a db
>>>(webservices/BioMoby would seem to be a good fit here) that others can
>>>call to extract this information: You pass in the chromosome, the map
>>>and the region and optionally some GO terms that you want to use to
>>>refine the returned results and the webservice returns a list of genes
>>>in that region that match those criteria. On the MOBY front - Im not
>>>sure if this is violating the atomic input > transform > output
>>>concept
>>>by doing too much in the transform step and it could certainly be
>>>broken down into component parts and joined back together.
>>>
>>>What do others think about this? Ultimately I'd love to see this
>>>genomic position based search expanded so I could pop a genome browser
>>>on top to display not every gene/feature/SNP etc. in a region but only
>>>those that match certain criteria - a genome-based search engine for
>>>the db.
>>>
>>>
>>>Simon.
>>>
>>>
>>>---------------------------------------------------------------------- --
>>>--------------------------
>>>Simon Twigger, Ph.D.
>>>Assistant Professor, Bioinformatics Research Center
>>>
>>>Medical College of Wisconsin
>>>8701 Watertown Plank Road,
>>>Milwaukee, WI, 53226
>>>tel. 414-456-8802, fax 414-456-6595
>>>
>>
>------------------------------------------------------------------------ 
>--------------------------
>Simon Twigger, Ph.D.
>Assistant Professor, Bioinformatics Research Center
>
>Medical College of Wisconsin
>8701 Watertown Plank Road,
>Milwaukee, WI, 53226
>tel. 414-456-8802, fax 414-456-6595