[Bioperl-l] Bio::DB::GFF modules
Lincoln Stein
lstein@cshl.org
Tue, 10 Jul 2001 11:04:12 -0400
Hi All,
I'm very sorry for committing a big wad of mostly undocumented code to
the repository. I'm working to rectify that today.
In case anyone's wondering what it is I'm working on, it's a
lightweight data access layer for sequence annotations. The idea is
to support both ACeDB databases and relational annotation databases
using a set of adaptors and aggregators.
The synopsis will give you a good idea of what it's all about:
use Bio::DB::GFF;
# Open the sequence database
my $db = Bio::DB::GFF->new( -adaptor => 'dbi:mysql',
-dsn => 'dbi:mysql:elegans42');
# fetch a 1 megabase segment of sequence starting at landmark "ZK909"
my $segment = $seqfactory->segment('ZK909', 1 => 1000000);
# pull out all transcript features
my @transcripts = $segment->features('transcript');
# for each transcript, total the length of the introns
my %totals;
for my $t (@transcripts) {
my @introns = $t->Intron;
$totals{$t->name} += $_->length foreach @introns;
}
# Sort the exons of the first transcript by position
my @exons = sort {$a->start <=> $b->start} $transcripts[0]->Exon;
# Get a region 1000 bp upstream of first exon
my $upstream = $exons[0]->segment(-1000,0);
# get its DNA
my $dna = $upstream->dna;
# and get all curated polymorphisms inside it
@polymorphisms = $upstream->contained_features('polymorphism:curated');
# get all feature types in the database
my @types = $db->types;
# last example: count all feature types that overlap the segment
my %type_counts = $segment->types(-enumerate=>1);
The module started out as a rewrite of Ace::Sequence, so the API still
varies from FeatureI in a few minor ways (using stop() instead of
end() for example). This will be fixed.
Lincoln
--
========================================================================
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein@cshl.org Cold Spring Harbor, NY
========================================================================