[Bioperl-l] Database Retrieval

Tue Aug 8 12:49:23 UTC 2006

Most of the Bio::DB::* classes implement Bio::DB::RandomAccessI,  
which is the origin of the get_Seq_by* methods that Bio::DB::GenBank  
and others use.  You could create a set of modules which implements  
an interface like RandomAccessI, grab the raw data on the backend  
using a UCSC-specific DB handle (using MySQL or whatever) or web  
agent, and get them into Bio* objects.

This is what Bio::DB::GenBank does.  It inherits from  
Bio::DB::NCBIHelper and Bio::DB::WebDBSeqI.  WebDBSeqI implements  
methods from RandomAccessI and adds a web agent; NCBIHelper inherits  
from WebDBSeqI and adds NCBI-specific parameters for remote access of  
the Entrez protein and nucleotide databases.

If you have the critical backend class made (remote or local access  
to the database), an interface could be designed similar to  
Bio::DB::GenBank.

Chris

On Aug 8, 2006, at 6:41 AM, Sean Davis wrote:

>
>
>
> On 8/8/06 5:21 AM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>
>> Sean Davis wrote:
>>>
>>> On 8/7/06 1:53 PM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>>>
>>>> Do you want to go ahead and look into making those classes for
>>>> accessing the common tables? It's in my plan to make various
>>>> aspects of genomic data retrieval a strength of bioperl as opposed
>>>> to a surprising missing link
>>>> (http://www.bioperl.org/wiki/Getting_Genomic_Sequences); I'll get
>>>> to that in a few weeks but if you lay the ground work or better yet
>>>> complete everything before then that would be great! :)
>>>
>>> So, there is a sketch of what things would look like here:
>>>
>>> http://watson.nci.nih.gov/~sdavis/Bio-DB-UCSC.tar.gz
>>
>> Thanks for that.
>>
>>
>>> only includes the refLink and refFlat tables so far, but adding  
>>> other
>>> tables is pretty straightforward, as you can see from the code.  I
>>> would love to hear comments.  Basically, to use, you can do  
>>> something
>>> like that shown in the synopsis and output is given below:
>>>
>>> NAME Bio::DB::UCSC - Access UCSC MySQL tables nicely
>>>
>>> SYNOPSIS use Bio::DB::UCSC::RefLink::Manager;
>>>
>>> my $reflinks = Bio::DB::UCSC::RefLink::Manager->get_reflinks( query
>>> => [ mrnaAcc => {like => 'NM_00002%'}, ], );
>>
>> I appreciate that this is due to the way Rose::DB works, but is it
>> possible to hide the SQL nature of what we're doing? Is it  
>> possible to
>> hide even the table names?
>>
>> Ideally the interface API would survive a complete change in UCSC's
>> table structures. The implementation would have to change, but  
>> user code
>> would not.
>>
>> Are you willing to take this on from your outline and develop a  
>> set of
>> more bioperlish modules? Even if you don't have time your  
>> contribution
>> so far is certainly valuable, so thank you.
>>
>> I envisage that Bio::DB::UCSC.pm would be the easy-to-use starting
>> point, presenting a code interface similar to the UCSC table browsing
>> web interface. And while it would implement using various submodules,
>> even UCSC.pm would be protected from SQL and table changes.
>
> That is certainly possible--this is perl, right?  I'll think about  
> it, but I
> doubt that I have the time to put together a satisfactory "grand"  
> solution
> that allows arbitrary queries without specifying SQL, returns bioperl
> objects, and doesn't reflect some of the underlying schema.  If one  
> settles
> on a set of objects that one wants to return, the process will be  
> easier,
> but that limits the information that one can get from the database.
>
> Practically, to have a "table-browser-like" code interface will  
> require
> exposing some of the SQL schema, as column names and table names  
> will need
> to come into it.  Taking such an approach, either based on RDBO or  
> with
> hand-coded SQL management, precludes returning bioperl-type  
> objects.  On the
> other hand, if one wants only bioperl-type objects returned, the  
> information
> that can be returned is quite limited and the query structure (from  
> a perl
> point of view) will need to be limited to a set of fields that can
> ultimately be used to look up only the information associated with  
> bioperl
> objects.  I think the table-browser-like approach is the better way  
> to go to
> start; let the user deal with making bioperl objects as he/she sees  
> fit once
> the data is back.  As a second round of development, one could  
> certainly
> build a compatibility layer that uses the primary query engine to  
> pull out
> information for constructing key bioperl objects, but I don't think  
> that
> should be the primary goal, but a secondary one.
>
> All that said, I think some more discussion with some judicious code
> examples (even if WAY off track, as mine probably is) is probably  
> needed
> before settling on a path forward.
>
> Sean
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign