[Bioperl-l] Genes from MySQL database using Bio::DB::GFF

Thu Aug 17 05:20:45 UTC 2006

Many thanks Scott,

I will probably follow your suggestion and start using PostGres. Beside
being a different database engine, is their any big difference between using
PostGres and MySQL?

Many thanks for the help, I was starting to doubt my ability to code!!

Cheers,

Marco

On 8/16/06 10:11 PM, "Scott Cain" <cain.cshl at gmail.com> wrote:

> Hi Marco,
> 
> Well, it works for me :-)
> 
> I ran this script:
> 
> #!/usr/bin/perl  -w
> use strict;
> 
> use Bio::DB::GFF;
> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::pg',
>                               -dsn => 'dbi:Pg:dbname=flybase');
> 
> my @feat = $db->get_feature_by_name('FBgn0025803');
> 
> for (@feat) {
>     print "$_\n" if ($_->method eq 'gene');
> }
> 
> and got one line:
> 
> gene:.(FBgn0025803)
> 
> The only real difference is that this in a PostgreSQL database and not
> MySQL.  I used Pg since I have that installed.  I'll blow away this
> database, install MySQL and see if that makes a difference (of course,
> it shouldn't, but you never know...)
> 
> Gaah!  I ran the exact same script with a mysql Bio::DB::GFF and got
> this out:
> 
> gene:.(FBgn0025803)
> gene:.(FBgn0025803)
> 
> Looks like a bug in the mysql adaptor.  I'll see if I can track it down;
> in the mean time, you could switch to a real database :-)
> 
> Scott
> 
> 
> 
> On Wed, 2006-08-16 at 23:30 -0400, Scott Cain wrote:
>> Hi Marco,
>> 
>> I'm working on it right now--my first guess (without doing any real
>> work), I'm betting on the problem being an incompatibility between the
>> GFF3 file and the Bio::DB::GFF schema.
>> 
>> Scott
>> 
>> 
>> On Wed, 2006-08-16 at 19:59 -0700, Marco Blanchette wrote:
>>> Dear all,
>>> 
>>> I am desperately trying to get a list of gene coordinates from a MySQL
>>> database version of the fly genome populated using the Bio::DB::GFF module.
>>> I have a list of 277 id in a text file that when parsed through the
>>> following script return 279 entries (2 more entries then the number of genes
>>> in the starting list).
>>> 
>>> Here is the script:
>>> 
>>> use Bio::DB::GFF;
>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>                               -dsn => 'dbi:mysql:database=dmel_43_new');
>>> while (<>){
>>>     chomp;
>>>     my @feat = $db->get_feature_by_name($_);
>>>     for my $f (@feat){
>>>         if ($f->type->method eq 'gene'){
>>>         print     "Name: ", $f->name,
>>>                 " Strand: ", $f->strand,
>>>                 " Start: ", $f->start,
>>>                 " End: ", $f->end,
>>>                 "\n";
>>>         }
>>>     }
>>> }
>>> 
>>> I totally don¹t understand where the 2 extra entries are coming from.
>>> Nothing differentiate them from each other. Moreover, when I double check
>>> the MySQL database, both genes are having only a single Œgene¹ entry in the
>>> fdata table.
>>> 
>>> Is there a bug in the way I am trying to fetch the individual genes or
>>> something is wrong with the latest Bio::DB::GFF module from the CVS
>>> repository?
>>> 
>>> Here is a test script and it¹s output that I am using to try to tract down
>>> what the problem is. Hope this could help:
>>> 
>>> use Bio::DB::GFF;
>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>>>                               -dsn => 'dbi:mysql:database=dmel_43_new');
>>> my %dups;
>>> my ($j, $i) =0;
>>> while (<>){
>>>     chomp;
>>>     my $id = $_;
>>>     my @feat = $db->get_feature_by_name($id);
>>>     my $feat_size = $#feat;
>>>     $j++ if $feat_size == 2;
>>>     
>>>     for my $f (@feat){
>>>         $i++;
>>>           
>>>         if (exists $dups{$f->group} && $f->type->method eq 'gene'){
>>>             print     "Calling >>>", $f->group,
>>>                         " ID=", $i,
>>>                         " from \@feat of size $feat_size",
>>>                         "\n";
>>>             print     "Chr: ", $f->refseq,
>>>                     " Strand: ", $f->strand,
>>>                     " Start: ", $f->start,
>>>                     " End: ", $f->end,
>>>                     "\n";
>>>             print "Offending >>>", $dups{$f->group}->[0]->group,
>>>                   " ID=", $dups{$f->group}->[1], "\n";
>>>             print     "Chr: ", $dups{$f->group}->[0]->refseq,
>>>                     " Strand: ", $dups{$f->group}->[0]->strand,
>>>                     " Start: ", $dups{$f->group}->[0]->start,
>>>                       " End: ", $dups{$f->group}->[0]->end;
>>>             print "\n\n";
>>>          } elsif ($f->type->method eq 'gene') {
>>>             $dups{$f->group} = [$f, $i];
>>>          }
>>>     }
>>> }
>>> 
>>> print "#### there was $j \@feat with only 2 features\n";
>>> 
>>> Output of the test script:
>>> 
>>> $ perl test.pl hrp36_targets.txt
>>> Calling >>>FBgn0025803 ID=98 from @feat of size 2
>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413
>>> Offending >>>FBgn0025803 ID=97
>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413
>>> 
>>> Calling >>>FBgn0025681 ID=304 from @feat of size 2
>>> Chr: 2L Strand: 1 Start: 2992964 End: 2998614
>>> Offending >>>FBgn0025681 ID=303
>>> Chr: 2L Strand: 1 Start: 2992964 End: 2998614
>>> 
>>> #### there was 11 @feat with only 2 features
>>> 
>>> With the hope someone can find out the problem...
>>> 
>>> Cheers,
>>> 
>>> Marco
>>> 
>>> ______________________________
>>> Marco Blanchette, Ph.D.
>>> 
>>> mblanche at uclink.berkeley.edu
>>> 
>>> Donald C. Rio's lab
>>> Department of Molecular and Cell Biology
>>> 16 Barker Hall
>>> University of California
>>> Berkeley, CA 94720-3204
>>> 
>>> Tel: (510) 642-1084
>>> Cell: (510) 847-0996
>>> Fax: (510) 642-6062
>> -- 
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
>> Cold Spring Harbor Laboratory
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Marco Blanchette, Ph.D.

mblanche at berkeley.edu

Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204

Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062