[Bioperl-l] Genes from MySQL database using Bio::DB::GFF
Marco Blanchette
mblanche at berkeley.edu
Thu Aug 17 05:20:45 UTC 2006
Many thanks Scott,
I will probably follow your suggestion and start using PostGres. Beside
being a different database engine, is their any big difference between using
PostGres and MySQL?
Many thanks for the help, I was starting to doubt my ability to code!!
Cheers,
Marco
On 8/16/06 10:11 PM, "Scott Cain" <cain.cshl at gmail.com> wrote:
> Hi Marco,
>
> Well, it works for me :-)
>
> I ran this script:
>
> #!/usr/bin/perl -w
> use strict;
>
> use Bio::DB::GFF;
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg',
> -dsn => 'dbi:Pg:dbname=flybase');
>
> my @feat = $db->get_feature_by_name('FBgn0025803');
>
> for (@feat) {
> print "$_\n" if ($_->method eq 'gene');
> }
>
> and got one line:
>
> gene:.(FBgn0025803)
>
> The only real difference is that this in a PostgreSQL database and not
> MySQL. I used Pg since I have that installed. I'll blow away this
> database, install MySQL and see if that makes a difference (of course,
> it shouldn't, but you never know...)
>
> Gaah! I ran the exact same script with a mysql Bio::DB::GFF and got
> this out:
>
> gene:.(FBgn0025803)
> gene:.(FBgn0025803)
>
> Looks like a bug in the mysql adaptor. I'll see if I can track it down;
> in the mean time, you could switch to a real database :-)
>
> Scott
>
>
>
> On Wed, 2006-08-16 at 23:30 -0400, Scott Cain wrote:
>> Hi Marco,
>>
>> I'm working on it right now--my first guess (without doing any real
>> work), I'm betting on the problem being an incompatibility between the
>> GFF3 file and the Bio::DB::GFF schema.
>>
>> Scott
>>
>>
>> On Wed, 2006-08-16 at 19:59 -0700, Marco Blanchette wrote:
>>> Dear all,
>>>
>>> I am desperately trying to get a list of gene coordinates from a MySQL
>>> database version of the fly genome populated using the Bio::DB::GFF module.
>>> I have a list of 277 id in a text file that when parsed through the
>>> following script return 279 entries (2 more entries then the number of genes
>>> in the starting list).
>>>
>>> Here is the script:
>>>
>>> use Bio::DB::GFF;
>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>> -dsn => 'dbi:mysql:database=dmel_43_new');
>>> while (<>){
>>> chomp;
>>> my @feat = $db->get_feature_by_name($_);
>>> for my $f (@feat){
>>> if ($f->type->method eq 'gene'){
>>> print "Name: ", $f->name,
>>> " Strand: ", $f->strand,
>>> " Start: ", $f->start,
>>> " End: ", $f->end,
>>> "\n";
>>> }
>>> }
>>> }
>>>
>>> I totally don¹t understand where the 2 extra entries are coming from.
>>> Nothing differentiate them from each other. Moreover, when I double check
>>> the MySQL database, both genes are having only a single Œgene¹ entry in the
>>> fdata table.
>>>
>>> Is there a bug in the way I am trying to fetch the individual genes or
>>> something is wrong with the latest Bio::DB::GFF module from the CVS
>>> repository?
>>>
>>> Here is a test script and it¹s output that I am using to try to tract down
>>> what the problem is. Hope this could help:
>>>
>>> use Bio::DB::GFF;
>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>> -dsn => 'dbi:mysql:database=dmel_43_new');
>>> my %dups;
>>> my ($j, $i) =0;
>>> while (<>){
>>> chomp;
>>> my $id = $_;
>>> my @feat = $db->get_feature_by_name($id);
>>> my $feat_size = $#feat;
>>> $j++ if $feat_size == 2;
>>>
>>> for my $f (@feat){
>>> $i++;
>>>
>>> if (exists $dups{$f->group} && $f->type->method eq 'gene'){
>>> print "Calling >>>", $f->group,
>>> " ID=", $i,
>>> " from \@feat of size $feat_size",
>>> "\n";
>>> print "Chr: ", $f->refseq,
>>> " Strand: ", $f->strand,
>>> " Start: ", $f->start,
>>> " End: ", $f->end,
>>> "\n";
>>> print "Offending >>>", $dups{$f->group}->[0]->group,
>>> " ID=", $dups{$f->group}->[1], "\n";
>>> print "Chr: ", $dups{$f->group}->[0]->refseq,
>>> " Strand: ", $dups{$f->group}->[0]->strand,
>>> " Start: ", $dups{$f->group}->[0]->start,
>>> " End: ", $dups{$f->group}->[0]->end;
>>> print "\n\n";
>>> } elsif ($f->type->method eq 'gene') {
>>> $dups{$f->group} = [$f, $i];
>>> }
>>> }
>>> }
>>>
>>> print "#### there was $j \@feat with only 2 features\n";
>>>
>>> Output of the test script:
>>>
>>> $ perl test.pl hrp36_targets.txt
>>> Calling >>>FBgn0025803 ID=98 from @feat of size 2
>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413
>>> Offending >>>FBgn0025803 ID=97
>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413
>>>
>>> Calling >>>FBgn0025681 ID=304 from @feat of size 2
>>> Chr: 2L Strand: 1 Start: 2992964 End: 2998614
>>> Offending >>>FBgn0025681 ID=303
>>> Chr: 2L Strand: 1 Start: 2992964 End: 2998614
>>>
>>> #### there was 11 @feat with only 2 features
>>>
>>> With the hope someone can find out the problem...
>>>
>>> Cheers,
>>>
>>> Marco
>>>
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>>
>>> mblanche at uclink.berkeley.edu
>>>
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>>
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D. cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Marco Blanchette, Ph.D.
mblanche at berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
More information about the Bioperl-l
mailing list