[Bioperl-l] Database Retrieval
Sean Davis
sdavis2 at mail.nih.gov
Mon Aug 7 22:44:54 UTC 2006
On 8/7/06 1:53 PM, "Sendu Bala" <bix at sendu.me.uk> wrote:
> Sean Davis wrote:
>>
>> On 8/7/06 11:51 AM, "Sendu Bala" <bix at sendu.me.uk> wrote:
>>
>>> Actually, would there be any interest in a bioperl interface to the UCSC
>>> tables? It could probably be done using their DAS server, and end up as
>>> a very easy-to-use alternative to the Ensembl API for limited or one-off
>>> queries.
>>
>> I haven't used their DAS server in a couple of years--it is probably worth
>> my trying it again at some point. I have gone the route of maintaining a
>> local ucsc mirror of the database. There are obvious advantages to doing
>> this, including speed of access, no access limits, and the power and
>> flexibility of SQL. With the simplicity of MySQL, one can even rsync
>> directory from UCSC's mysql data directory directly, eliminating the need to
>> do a table import step. I haven't gone so far as to implement a perl-based
>> wrapper, but one could envision doing so for the most commonly-used tables,
>> perhaps using Rose::DB::Object or DBIx::Class as a base with other methods
>> added as needed based on the data contained in the table.
>
> Yes, using MySQL would probably be better. Any interface would ideally
> use the UCSC MySQL server by default (with enforced X second waits),
> with an option to set the address to your own server.
This is the route I have gone.
> Do you want to go ahead and look into making those classes for accessing
> the common tables? It's in my plan to make various aspects of genomic
> data retrieval a strength of bioperl as opposed to a surprising missing
> link (http://www.bioperl.org/wiki/Getting_Genomic_Sequences); I'll get
> to that in a few weeks but if you lay the ground work or better yet
> complete everything before then that would be great! :)
So, there is a sketch of what things would look like here:
http://watson.nci.nih.gov/~sdavis/Bio-DB-UCSC.tar.gz
You can install it using the usual:
perl Makefile.PL
make test
make install
You will need to have installed:
Rose::DB::Object
As this is what ORM I am using. It is very crude and only includes the
refLink and refFlat tables so far, but adding other tables is pretty
straightforward, as you can see from the code. I would love to hear
comments. Basically, to use, you can do something like that shown in the
synopsis and output is given below:
NAME
Bio::DB::UCSC - Access UCSC MySQL tables nicely
SYNOPSIS
use Bio::DB::UCSC::RefLink::Manager;
my $reflinks = Bio::DB::UCSC::RefLink::Manager->get_reflinks(
query => [
mrnaAcc => {like => 'NM_00002%'},
],
);
foreach my $reflink (@$reflinks) {
print "Accession: ",$reflink->mrnaAcc,"\n";
print " Gene ID: ",$reflink->locusLinkId,"\n";
print " Locations: \n";
# and get all locations for each reflink
# reflink table is related to refflat table
my $refflats = $reflink->refflats;
foreach my $refflat (@$refflats) {
print " Chrom: ",$refflat->chrom,
" Transcription Start: ",$refflat->txStart,"\n";
}
}
OUTPUT:
Accession: NM_000020
Gene ID: 94
Locations:
Chrom: chr12 Transcription Start: 50587468
Accession: NM_000026
Gene ID: 158
Locations:
Chrom: chr22 Transcription Start: 39072508
Accession: NM_000022
Gene ID: 100
Locations:
Chrom: chr20 Transcription Start: 42681577
Accession: NM_000027
Gene ID: 175
Locations:
Chrom: chr4 Transcription Start: 178588917
Accession: NM_000028
Gene ID: 178
Locations:
Chrom: chr1 Transcription Start: 100088632
Accession: NM_000023
Gene ID: 6442
Locations:
Chrom: chr17 Transcription Start: 45598389
Accession: NM_000029
Gene ID: 183
Locations:
Chrom: chr1 Transcription Start: 228904891
Accession: NM_000025
Gene ID: 155
Locations:
Chrom: chr8 Transcription Start: 37939672
Accession: NM_000021
Gene ID: 5663
Locations:
Chrom: chr14 Transcription Start: 72672931
Accession: NM_000024
Gene ID: 154
Locations:
Chrom: chr5 Transcription Start: 148186368
More information about the Bioperl-l
mailing list