[Bioperl-l] Database Retrieval
Hilmar Lapp
hlapp at gmx.net
Tue Aug 8 03:54:16 UTC 2006
Sean,
would it be possible to return standard bioperl objects, like
Bio:SeqI objects, or Bio::Annotation::Reference, Bio::LocationI, etc?
-hilmar
On Aug 7, 2006, at 6:44 PM, Sean Davis wrote:
>
>
>
> On 8/7/06 1:53 PM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>
>> Sean Davis wrote:
>>>
>>> On 8/7/06 11:51 AM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>>>
>>>> Actually, would there be any interest in a bioperl interface to
>>>> the UCSC
>>>> tables? It could probably be done using their DAS server, and
>>>> end up as
>>>> a very easy-to-use alternative to the Ensembl API for limited or
>>>> one-off
>>>> queries.
>>>
>>> I haven't used their DAS server in a couple of years--it is
>>> probably worth
>>> my trying it again at some point. I have gone the route of
>>> maintaining a
>>> local ucsc mirror of the database. There are obvious advantages
>>> to doing
>>> this, including speed of access, no access limits, and the power and
>>> flexibility of SQL. With the simplicity of MySQL, one can even
>>> rsync
>>> directory from UCSC's mysql data directory directly, eliminating
>>> the need to
>>> do a table import step. I haven't gone so far as to implement a
>>> perl-based
>>> wrapper, but one could envision doing so for the most commonly-
>>> used tables,
>>> perhaps using Rose::DB::Object or DBIx::Class as a base with
>>> other methods
>>> added as needed based on the data contained in the table.
>>
>> Yes, using MySQL would probably be better. Any interface would
>> ideally
>> use the UCSC MySQL server by default (with enforced X second waits),
>> with an option to set the address to your own server.
>
> This is the route I have gone.
>
>> Do you want to go ahead and look into making those classes for
>> accessing
>> the common tables? It's in my plan to make various aspects of genomic
>> data retrieval a strength of bioperl as opposed to a surprising
>> missing
>> link (http://www.bioperl.org/wiki/Getting_Genomic_Sequences); I'll
>> get
>> to that in a few weeks but if you lay the ground work or better yet
>> complete everything before then that would be great! :)
>
> So, there is a sketch of what things would look like here:
>
> http://watson.nci.nih.gov/~sdavis/Bio-DB-UCSC.tar.gz
>
> You can install it using the usual:
>
> perl Makefile.PL
> make test
> make install
>
> You will need to have installed:
>
> Rose::DB::Object
>
> As this is what ORM I am using. It is very crude and only includes
> the
> refLink and refFlat tables so far, but adding other tables is pretty
> straightforward, as you can see from the code. I would love to hear
> comments. Basically, to use, you can do something like that shown
> in the
> synopsis and output is given below:
>
> NAME
> Bio::DB::UCSC - Access UCSC MySQL tables nicely
>
> SYNOPSIS
> use Bio::DB::UCSC::RefLink::Manager;
>
> my $reflinks = Bio::DB::UCSC::RefLink::Manager->get_reflinks(
> query => [
> mrnaAcc => {like => 'NM_00002%'},
> ],
> );
>
> foreach my $reflink (@$reflinks) {
> print "Accession: ",$reflink->mrnaAcc,"\n";
> print " Gene ID: ",$reflink->locusLinkId,"\n";
> print " Locations: \n";
> # and get all locations for each reflink
> # reflink table is related to refflat table
> my $refflats = $reflink->refflats;
> foreach my $refflat (@$refflats) {
> print " Chrom: ",$refflat->chrom,
> " Transcription Start: ",$refflat-
> >txStart,"\n";
> }
> }
>
> OUTPUT:
> Accession: NM_000020
> Gene ID: 94
> Locations:
> Chrom: chr12 Transcription Start: 50587468
> Accession: NM_000026
> Gene ID: 158
> Locations:
> Chrom: chr22 Transcription Start: 39072508
> Accession: NM_000022
> Gene ID: 100
> Locations:
> Chrom: chr20 Transcription Start: 42681577
> Accession: NM_000027
> Gene ID: 175
> Locations:
> Chrom: chr4 Transcription Start: 178588917
> Accession: NM_000028
> Gene ID: 178
> Locations:
> Chrom: chr1 Transcription Start: 100088632
> Accession: NM_000023
> Gene ID: 6442
> Locations:
> Chrom: chr17 Transcription Start: 45598389
> Accession: NM_000029
> Gene ID: 183
> Locations:
> Chrom: chr1 Transcription Start: 228904891
> Accession: NM_000025
> Gene ID: 155
> Locations:
> Chrom: chr8 Transcription Start: 37939672
> Accession: NM_000021
> Gene ID: 5663
> Locations:
> Chrom: chr14 Transcription Start: 72672931
> Accession: NM_000024
> Gene ID: 154
> Locations:
> Chrom: chr5 Transcription Start: 148186368
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
--
===========================================================
: Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
===========================================================
More information about the Bioperl-l
mailing list