[Bioperl-l] Genes from MySQL database using Bio::DB::GFF

Scott Cain cain.cshl at gmail.com
Thu Aug 17 06:17:08 UTC 2006


Um, no idea.  I haven't even tried it myself yet (which is why I didn't
answer Chris' question about it a few days ago).  Sorry.

On Wed, 2006-08-16 at 23:09 -0700, Marco Blanchette wrote:
> Gnarl... my problem with the Bio::DB::SeqFeature (and itsbp_seqfeature.load.pl script) is that it doesn't integrate the fastasequences database yet (at from what I can tell from the version install onmy workstation). 
> What is the expected timeline on the Bio::DB::SeqFeature and how stable andreliable the latest version is?
> Many thanks for your help Scott,
> Marco
> 
> On 8/16/06 10:54 PM, "Scott Cain" <cain.cshl at gmail.com> wrote:
> > Marco,> > After stepping my script through the debugger, I am pretty sure that> this really does come down to the incompatibilities between the> Bio::DB::GFF schema and some GFF3 files.  In this case, amusingly> enough, Lincoln's efforts to make the Bio::DB::GFF mysql adaptor> compatible with GFF3 has lead to this bug, whereas I didn't do the same> for the Postgres adaptor.  Unfortunately, I can't guarantee you that if> you were to switch to Postgres that it would work because it may miss> cases that the MySQL adaptor is getting.> > You could try Bio::DB::SeqFeature (loaded with bp_seqfeature_load.pl)> which was designed to work with GFF3 files.  Welcome to the bleeding> edge :-)> > Scott> > > On Wed, 2006-08-16 at 22:20 -0700, Marco Blanchette wrote:>> Many thanks Scott,>> I will probably follow your suggestion and start using PostGres. Besidebeing>> a different database engine, is their any big difference between>> usingPostGres and MySQL?>> Many thanks for the help, I was starting to doubt my ability to code!!>> Cheers,>> >> Marco>> >> On 8/16/06 10:11 PM, "Scott Cain" <cain.cshl at gmail.com> wrote:>>> Hi Marco,> > Well, it works for me :-)> > I ran this script:> >>>> #!/usr/bin/perl  -w> use strict;> > use Bio::DB::GFF;> my $db =>>> Bio::DB::GFF->new( -adaptor => 'dbi::pg',>>>> -dsn => 'dbi:Pg:dbname=flybase');> > my @feat =>>> $db->get_feature_by_name('FBgn0025803');> > for (@feat) {>     print "$_\n">>> if ($_->method eq 'gene');> }> > and got one line:> > gene:.(FBgn0025803)> >>>> The only real difference is that this in a PostgreSQL database and not>>>> MySQL.  I used Pg since I have that installed.  I'll blow away this>>>> database, install MySQL and see if that makes a difference (of course,> it>>> shouldn't, but you never know...)> > Gaah!  I ran the exact same script with>>> a mysql Bio::DB::GFF and got> this out:> > gene:.(FBgn0025803)>>>> gene:.(FBgn0025803)> > Looks like a bug in the mysql adaptor.  I'll see if I>>> can track it down;> in the mean time, you could switch to a real database>>> :-)> > Scott> > > > On Wed, 2006-08-16 at 23:30 -0400, Scott Cain wrote:>>>>> Hi Marco,>> >> I'm working on it right now--my first guess (without doing>>> any real>> work), I'm betting on the problem being an incompatibility>>> between the>> GFF3 file and the Bio::DB::GFF schema.>> >> Scott>> >> >> On>>> Wed, 2006-08-16 at 19:59 -0700, Marco Blanchette wrote:>>> Dear all,>>> >>>>>> I am desperately trying to get a list of gene coordinates from a MySQL>>>>>> database version of the fly genome populated using the Bio::DB::GFF>>> module.>>> I have a list of 277 id in a text file that when parsed through>>> the>>> following script return 279 entries (2 more entries then the number>>> of genes>>> in the starting list).>>> >>> Here is the script:>>> >>> use>>> Bio::DB::GFF;>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',>>>>>> -dsn => 'dbi:mysql:database=dmel_43_new');>>> while (<>){>>>     chomp;>>>>>> my @feat = $db->get_feature_by_name($_);>>>     for my $f (@feat){>>>>>> if ($f->type->method eq 'gene'){>>>         print     "Name: ", $f->name,>>>>>> " Strand: ", $f->strand,>>>                 " Start: ", $f->start,>>>>>> " End: ", $f->end,>>>                 "\n";>>>         }>>>     }>>> }>>>>>> >>> I totally don¹t understand where the 2 extra entries are coming from.>>>>>> Nothing differentiate them from each other. Moreover, when I double check>>>>>> the MySQL database, both genes are having only a single Œgene¹ entry in>>> the>>> fdata table.>>> >>> Is there a bug in the way I am trying to fetch>>> the individual genes or>>> something is wrong with the latest Bio::DB::GFF>>> module from the CVS>>> repository?>>> >>> Here is a test script and it¹s>>> output that I am using to try to tract down>>> what the problem is. Hope>>> this could help:>>> >>> use Bio::DB::GFF;>>> my $db = Bio::DB::GFF->new(>>> -adaptor => 'dbi::mysql',>>>                               -dsn =>>>> 'dbi:mysql:database=dmel_43_new');>>> my %dups;>>> my ($j, $i) =0;>>> while>>> (<>){>>>     chomp;>>>     my $id = $_;>>>     my @feat =>>> $db->get_feature_by_name($id);>>>     my $feat_size = $#feat;>>>     $j++ if>>> $feat_size == 2;>>>     >>>     for my $f (@feat){>>>         $i++;>>>>>> >>>         if (exists $dups{$f->group} && $f->type->method eq 'gene'){>>>>>> print     "Calling >>>", $f->group,>>>                         " ID=",>>> $i,>>>                         " from \@feat of size $feat_size",>>>>>> "\n";>>>             print     "Chr: ", $f->refseq,>>>                     ">>> Strand: ", $f->strand,>>>                     " Start: ", $f->start,>>>>>> " End: ", $f->end,>>>                     "\n";>>>             print>>> "Offending >>>", $dups{$f->group}->[0]->group,>>>                   " ID=",>>> $dups{$f->group}->[1], "\n";>>>             print     "Chr: ",>>> $dups{$f->group}->[0]->refseq,>>>                     " Strand: ",>>> $dups{$f->group}->[0]->strand,>>>                     " Start: ",>>> $dups{$f->group}->[0]->start,>>>                       " End: ",>>> $dups{$f->group}->[0]->end;>>>             print "\n\n";>>>          } elsif>>> ($f->type->method eq 'gene') {>>>             $dups{$f->group} = [$f,>>> $i];>>>          }>>>     }>>> }>>> >>> print "#### there was $j \@feat with>>> only 2 features\n";>>> >>> Output of the test script:>>> >>> $ perl test.pl>>> hrp36_targets.txt>>> Calling >>>FBgn0025803 ID=98 from @feat of size 2>>>>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413>>> Offending >>>FBgn0025803>>> ID=97>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413>>> >>> Calling>>> >>>FBgn0025681 ID=304 from @feat of size 2>>> Chr: 2L Strand: 1 Start:>>> 2992964 End: 2998614>>> Offending >>>FBgn0025681 ID=303>>> Chr: 2L Strand: 1>>> Start: 2992964 End: 2998614>>> >>> #### there was 11 @feat with only 2>>> features>>> >>> With the hope someone can find out the problem...>>> >>>>>> Cheers,>>> >>> Marco>>> >>> ______________________________>>> Marco>>> Blanchette, Ph.D.>>> >>> mblanche at uclink.berkeley.edu>>> >>> Donald C. Rio's>>> lab>>> Department of Molecular and Cell Biology>>> 16 Barker Hall>>>>>> University of California>>> Berkeley, CA 94720-3204>>> >>> Tel: (510)>>> 642-1084>>> Cell: (510) 847-0996>>> Fax: (510) 642-6062>> -- >>>>> ------------------------------------------------------------------------>>>>> Scott Cain, Ph. D.                                   cain.cshl at gmail.com>>>>> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087>>>>> Cold Spring Harbor Laboratory>> >>>>> _______________________________________________>> Bioperl-l mailing list>>>>> Bioperl-l at lists.open-bio.org>>>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l>> >> Marco Blanchette, Ph.D.>> mblanche at berkeley.edu>> Donald C. Rio's labDepartment of Molecular and Cell Biology16 Barker>> HallUniversity of CaliforniaBerkeley, CA 94720-3204>> Tel: (510) 642-1084Cell: (510) 847-0996Fax: (510) 642-6062>> >> >> >> >> >> _______________________________________________Bioperl-l mailing>> listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bi>> operl-l>> 
> 
> Marco Blanchette, Ph.D.
> mblanche at berkeley.edu
> Donald C. Rio's labDepartment of Molecular and Cell Biology16 Barker HallUniversity of CaliforniaBerkeley, CA 94720-3204
> Tel: (510) 642-1084Cell: (510) 847-0996Fax: (510) 642-6062
> 
> 
> 
> 
> 
> _______________________________________________Bioperl-l mailing listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bioperl-l
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20060817/b27831da/attachment.sig>


More information about the Bioperl-l mailing list