[Bioperl-l] Database Retrieval

Tue Aug 8 03:54:16 UTC 2006

Sean,

would it be possible to return standard bioperl objects, like  
Bio:SeqI objects, or Bio::Annotation::Reference, Bio::LocationI, etc?

	-hilmar

On Aug 7, 2006, at 6:44 PM, Sean Davis wrote:

>
>
>
> On 8/7/06 1:53 PM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>
>> Sean Davis wrote:
>>>
>>> On 8/7/06 11:51 AM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>>>
>>>> Actually, would there be any interest in a bioperl interface to  
>>>> the UCSC
>>>> tables? It could probably be done using their DAS server, and  
>>>> end up as
>>>> a very easy-to-use alternative to the Ensembl API for limited or  
>>>> one-off
>>>> queries.
>>>
>>> I haven't used their DAS server in a couple of years--it is  
>>> probably worth
>>> my trying it again at some point.  I have gone the route of  
>>> maintaining a
>>> local ucsc mirror of the database.  There are obvious advantages  
>>> to doing
>>> this, including speed of access, no access limits, and the power and
>>> flexibility of SQL.  With the simplicity of MySQL, one can even  
>>> rsync
>>> directory from UCSC's mysql data directory directly, eliminating  
>>> the need to
>>> do a table import step.  I haven't gone so far as to implement a  
>>> perl-based
>>> wrapper, but one could envision doing so for the most commonly- 
>>> used tables,
>>> perhaps using Rose::DB::Object or DBIx::Class as a base with  
>>> other methods
>>> added as needed based on the data contained in the table.
>>
>> Yes, using MySQL would probably be better. Any interface would  
>> ideally
>> use the UCSC MySQL server by default (with enforced X second waits),
>> with an option to set the address to your own server.
>
> This is the route I have gone.
>
>> Do you want to go ahead and look into making those classes for  
>> accessing
>> the common tables? It's in my plan to make various aspects of genomic
>> data retrieval a strength of bioperl as opposed to a surprising  
>> missing
>> link (http://www.bioperl.org/wiki/Getting_Genomic_Sequences); I'll  
>> get
>> to that in a few weeks but if you lay the ground work or better yet
>> complete everything before then that would be great! :)
>
> So, there is a sketch of what things would look like here:
>
> http://watson.nci.nih.gov/~sdavis/Bio-DB-UCSC.tar.gz
>
> You can install it using the usual:
>
>  perl Makefile.PL
>  make test
>  make install
>
> You will need to have installed:
>
>  Rose::DB::Object
>
> As this is what ORM I am using.  It is very crude and only includes  
> the
> refLink and refFlat tables so far, but adding other tables is pretty
> straightforward, as you can see from the code.  I would love to hear
> comments.  Basically, to use, you can do something like that shown  
> in the
> synopsis and output is given below:
>
> NAME
>     Bio::DB::UCSC - Access UCSC MySQL tables nicely
>
> SYNOPSIS
>       use Bio::DB::UCSC::RefLink::Manager;
>
>       my $reflinks = Bio::DB::UCSC::RefLink::Manager->get_reflinks(
>           query => [
>                     mrnaAcc => {like => 'NM_00002%'},
>                    ],
>           );
>
>       foreach my $reflink (@$reflinks) {
>           print "Accession:  ",$reflink->mrnaAcc,"\n";
>           print "    Gene ID:  ",$reflink->locusLinkId,"\n";
>           print "    Locations: \n";
>           # and get all locations for each reflink
>           # reflink table is related to refflat table
>           my $refflats = $reflink->refflats;
>           foreach my $refflat (@$refflats) {
>               print "    Chrom: ",$refflat->chrom,
>                     "    Transcription Start:  ",$refflat- 
> >txStart,"\n";
>           }
>       }
>
> OUTPUT:
> Accession:  NM_000020
>     Gene ID:  94
>     Locations:
>     Chrom: chr12    Transcription Start:  50587468
> Accession:  NM_000026
>     Gene ID:  158
>     Locations:
>     Chrom: chr22    Transcription Start:  39072508
> Accession:  NM_000022
>     Gene ID:  100
>     Locations:
>     Chrom: chr20    Transcription Start:  42681577
> Accession:  NM_000027
>     Gene ID:  175
>     Locations:
>     Chrom: chr4    Transcription Start:  178588917
> Accession:  NM_000028
>     Gene ID:  178
>     Locations:
>     Chrom: chr1    Transcription Start:  100088632
> Accession:  NM_000023
>     Gene ID:  6442
>     Locations:
>     Chrom: chr17    Transcription Start:  45598389
> Accession:  NM_000029
>     Gene ID:  183
>     Locations:
>     Chrom: chr1    Transcription Start:  228904891
> Accession:  NM_000025
>     Gene ID:  155
>     Locations:
>     Chrom: chr8    Transcription Start:  37939672
> Accession:  NM_000021
>     Gene ID:  5663
>     Locations:
>     Chrom: chr14    Transcription Start:  72672931
> Accession:  NM_000024
>     Gene ID:  154
>     Locations:
>     Chrom: chr5    Transcription Start:  148186368
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>

-- 
===========================================================
: Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
===========================================================