[Bioperl-l] Genes from MySQL database using Bio::DB::GFF
Marco Blanchette
mblanche at berkeley.edu
Thu Aug 17 02:59:04 UTC 2006
Dear all,
I am desperately trying to get a list of gene coordinates from a MySQL
database version of the fly genome populated using the Bio::DB::GFF module.
I have a list of 277 id in a text file that when parsed through the
following script return 279 entries (2 more entries then the number of genes
in the starting list).
Here is the script:
use Bio::DB::GFF;
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
-dsn => 'dbi:mysql:database=dmel_43_new');
while (<>){
chomp;
my @feat = $db->get_feature_by_name($_);
for my $f (@feat){
if ($f->type->method eq 'gene'){
print "Name: ", $f->name,
" Strand: ", $f->strand,
" Start: ", $f->start,
" End: ", $f->end,
"\n";
}
}
}
I totally don¹t understand where the 2 extra entries are coming from.
Nothing differentiate them from each other. Moreover, when I double check
the MySQL database, both genes are having only a single gene¹ entry in the
fdata table.
Is there a bug in the way I am trying to fetch the individual genes or
something is wrong with the latest Bio::DB::GFF module from the CVS
repository?
Here is a test script and it¹s output that I am using to try to tract down
what the problem is. Hope this could help:
use Bio::DB::GFF;
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
-dsn => 'dbi:mysql:database=dmel_43_new');
my %dups;
my ($j, $i) =0;
while (<>){
chomp;
my $id = $_;
my @feat = $db->get_feature_by_name($id);
my $feat_size = $#feat;
$j++ if $feat_size == 2;
for my $f (@feat){
$i++;
if (exists $dups{$f->group} && $f->type->method eq 'gene'){
print "Calling >>>", $f->group,
" ID=", $i,
" from \@feat of size $feat_size",
"\n";
print "Chr: ", $f->refseq,
" Strand: ", $f->strand,
" Start: ", $f->start,
" End: ", $f->end,
"\n";
print "Offending >>>", $dups{$f->group}->[0]->group,
" ID=", $dups{$f->group}->[1], "\n";
print "Chr: ", $dups{$f->group}->[0]->refseq,
" Strand: ", $dups{$f->group}->[0]->strand,
" Start: ", $dups{$f->group}->[0]->start,
" End: ", $dups{$f->group}->[0]->end;
print "\n\n";
} elsif ($f->type->method eq 'gene') {
$dups{$f->group} = [$f, $i];
}
}
}
print "#### there was $j \@feat with only 2 features\n";
Output of the test script:
$ perl test.pl hrp36_targets.txt
Calling >>>FBgn0025803 ID=98 from @feat of size 2
Chr: 3R Strand: 1 Start: 16966463 End: 17038413
Offending >>>FBgn0025803 ID=97
Chr: 3R Strand: 1 Start: 16966463 End: 17038413
Calling >>>FBgn0025681 ID=304 from @feat of size 2
Chr: 2L Strand: 1 Start: 2992964 End: 2998614
Offending >>>FBgn0025681 ID=303
Chr: 2L Strand: 1 Start: 2992964 End: 2998614
#### there was 11 @feat with only 2 features
With the hope someone can find out the problem...
Cheers,
Marco
______________________________
Marco Blanchette, Ph.D.
mblanche at uclink.berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
--
More information about the Bioperl-l
mailing list