[Bioperl-l] Genes from MySQL database using Bio::DB::GFF
Marco Blanchette
mblanche at berkeley.edu
Thu Aug 17 06:09:25 UTC 2006
Gnarl... my problem with the Bio::DB::SeqFeature (and its
bp_seqfeature.load.pl script) is that it doesn't integrate the fasta
sequences database yet (at from what I can tell from the version install on
my workstation).
What is the expected timeline on the Bio::DB::SeqFeature and how stable and
reliable the latest version is?
Many thanks for your help Scott,
Marco
On 8/16/06 10:54 PM, "Scott Cain" <cain.cshl at gmail.com> wrote:
> Marco,
>
> After stepping my script through the debugger, I am pretty sure that
> this really does come down to the incompatibilities between the
> Bio::DB::GFF schema and some GFF3 files. In this case, amusingly
> enough, Lincoln's efforts to make the Bio::DB::GFF mysql adaptor
> compatible with GFF3 has lead to this bug, whereas I didn't do the same
> for the Postgres adaptor. Unfortunately, I can't guarantee you that if
> you were to switch to Postgres that it would work because it may miss
> cases that the MySQL adaptor is getting.
>
> You could try Bio::DB::SeqFeature (loaded with bp_seqfeature_load.pl)
> which was designed to work with GFF3 files. Welcome to the bleeding
> edge :-)
>
> Scott
>
>
> On Wed, 2006-08-16 at 22:20 -0700, Marco Blanchette wrote:
>> Many thanks Scott,
>> I will probably follow your suggestion and start using PostGres. Besidebeing
>> a different database engine, is their any big difference between
>> usingPostGres and MySQL?
>> Many thanks for the help, I was starting to doubt my ability to code!!
>> Cheers,
>>
>> Marco
>>
>> On 8/16/06 10:11 PM, "Scott Cain" <cain.cshl at gmail.com> wrote:
>>> Hi Marco,> > Well, it works for me :-)> > I ran this script:> >
>>> #!/usr/bin/perl -w> use strict;> > use Bio::DB::GFF;> my $db =
>>> Bio::DB::GFF->new( -adaptor => 'dbi::pg',>
>>> -dsn => 'dbi:Pg:dbname=flybase');> > my @feat =
>>> $db->get_feature_by_name('FBgn0025803');> > for (@feat) {> print "$_\n"
>>> if ($_->method eq 'gene');> }> > and got one line:> > gene:.(FBgn0025803)> >
>>> The only real difference is that this in a PostgreSQL database and not>
>>> MySQL. I used Pg since I have that installed. I'll blow away this>
>>> database, install MySQL and see if that makes a difference (of course,> it
>>> shouldn't, but you never know...)> > Gaah! I ran the exact same script with
>>> a mysql Bio::DB::GFF and got> this out:> > gene:.(FBgn0025803)>
>>> gene:.(FBgn0025803)> > Looks like a bug in the mysql adaptor. I'll see if I
>>> can track it down;> in the mean time, you could switch to a real database
>>> :-)> > Scott> > > > On Wed, 2006-08-16 at 23:30 -0400, Scott Cain wrote:>>
>>> Hi Marco,>> >> I'm working on it right now--my first guess (without doing
>>> any real>> work), I'm betting on the problem being an incompatibility
>>> between the>> GFF3 file and the Bio::DB::GFF schema.>> >> Scott>> >> >> On
>>> Wed, 2006-08-16 at 19:59 -0700, Marco Blanchette wrote:>>> Dear all,>>> >>>
>>> I am desperately trying to get a list of gene coordinates from a MySQL>>>
>>> database version of the fly genome populated using the Bio::DB::GFF
>>> module.>>> I have a list of 277 id in a text file that when parsed through
>>> the>>> following script return 279 entries (2 more entries then the number
>>> of genes>>> in the starting list).>>> >>> Here is the script:>>> >>> use
>>> Bio::DB::GFF;>>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',>>>
>>> -dsn => 'dbi:mysql:database=dmel_43_new');>>> while (<>){>>> chomp;>>>
>>> my @feat = $db->get_feature_by_name($_);>>> for my $f (@feat){>>>
>>> if ($f->type->method eq 'gene'){>>> print "Name: ", $f->name,>>>
>>> " Strand: ", $f->strand,>>> " Start: ", $f->start,>>>
>>> " End: ", $f->end,>>> "\n";>>> }>>> }>>> }>>>
>>> >>> I totally don¹t understand where the 2 extra entries are coming from.>>>
>>> Nothing differentiate them from each other. Moreover, when I double check>>>
>>> the MySQL database, both genes are having only a single Œgene¹ entry in
>>> the>>> fdata table.>>> >>> Is there a bug in the way I am trying to fetch
>>> the individual genes or>>> something is wrong with the latest Bio::DB::GFF
>>> module from the CVS>>> repository?>>> >>> Here is a test script and it¹s
>>> output that I am using to try to tract down>>> what the problem is. Hope
>>> this could help:>>> >>> use Bio::DB::GFF;>>> my $db = Bio::DB::GFF->new(
>>> -adaptor => 'dbi::mysql',>>> -dsn =>
>>> 'dbi:mysql:database=dmel_43_new');>>> my %dups;>>> my ($j, $i) =0;>>> while
>>> (<>){>>> chomp;>>> my $id = $_;>>> my @feat =
>>> $db->get_feature_by_name($id);>>> my $feat_size = $#feat;>>> $j++ if
>>> $feat_size == 2;>>> >>> for my $f (@feat){>>> $i++;>>>
>>> >>> if (exists $dups{$f->group} && $f->type->method eq 'gene'){>>>
>>> print "Calling >>>", $f->group,>>> " ID=",
>>> $i,>>> " from \@feat of size $feat_size",>>>
>>> "\n";>>> print "Chr: ", $f->refseq,>>> "
>>> Strand: ", $f->strand,>>> " Start: ", $f->start,>>>
>>> " End: ", $f->end,>>> "\n";>>> print
>>> "Offending >>>", $dups{$f->group}->[0]->group,>>> " ID=",
>>> $dups{$f->group}->[1], "\n";>>> print "Chr: ",
>>> $dups{$f->group}->[0]->refseq,>>> " Strand: ",
>>> $dups{$f->group}->[0]->strand,>>> " Start: ",
>>> $dups{$f->group}->[0]->start,>>> " End: ",
>>> $dups{$f->group}->[0]->end;>>> print "\n\n";>>> } elsif
>>> ($f->type->method eq 'gene') {>>> $dups{$f->group} = [$f,
>>> $i];>>> }>>> }>>> }>>> >>> print "#### there was $j \@feat with
>>> only 2 features\n";>>> >>> Output of the test script:>>> >>> $ perl test.pl
>>> hrp36_targets.txt>>> Calling >>>FBgn0025803 ID=98 from @feat of size 2>>>
>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413>>> Offending >>>FBgn0025803
>>> ID=97>>> Chr: 3R Strand: 1 Start: 16966463 End: 17038413>>> >>> Calling
>>> >>>FBgn0025681 ID=304 from @feat of size 2>>> Chr: 2L Strand: 1 Start:
>>> 2992964 End: 2998614>>> Offending >>>FBgn0025681 ID=303>>> Chr: 2L Strand: 1
>>> Start: 2992964 End: 2998614>>> >>> #### there was 11 @feat with only 2
>>> features>>> >>> With the hope someone can find out the problem...>>> >>>
>>> Cheers,>>> >>> Marco>>> >>> ______________________________>>> Marco
>>> Blanchette, Ph.D.>>> >>> mblanche at uclink.berkeley.edu>>> >>> Donald C. Rio's
>>> lab>>> Department of Molecular and Cell Biology>>> 16 Barker Hall>>>
>>> University of California>>> Berkeley, CA 94720-3204>>> >>> Tel: (510)
>>> 642-1084>>> Cell: (510) 847-0996>>> Fax: (510) 642-6062>> -- >>
>>> ------------------------------------------------------------------------>>
>>> Scott Cain, Ph. D. cain.cshl at gmail.com>>
>>> GMOD Coordinator (http://www.gmod.org/) 216-392-3087>>
>>> Cold Spring Harbor Laboratory>> >>
>>> _______________________________________________>> Bioperl-l mailing list>>
>>> Bioperl-l at lists.open-bio.org>>
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> Marco Blanchette, Ph.D.
>> mblanche at berkeley.edu
>> Donald C. Rio's labDepartment of Molecular and Cell Biology16 Barker
>> HallUniversity of CaliforniaBerkeley, CA 94720-3204
>> Tel: (510) 642-1084Cell: (510) 847-0996Fax: (510) 642-6062
>>
>>
>>
>>
>>
>> _______________________________________________Bioperl-l mailing
>> listBioperl-l at lists.open-bio.orghttp://lists.open-bio.org/mailman/listinfo/bi
>> operl-l
>>
Marco Blanchette, Ph.D.
mblanche at berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
More information about the Bioperl-l
mailing list