[Bioperl-l] Extracting gene seq from Bio::DB::GFF
Marco Blanchette
mblanche at berkeley.edu
Fri Aug 11 21:36:27 UTC 2006
Many thanks Scott,
At the same time I got your email I was coming to the same conclusion as
you.
Now I have a stranger problem in my hands... My goal is quite simple, I try
to get the sequence of the genes back from the Bio::DB::GFF database loaded
on MySQL. The gene list is from a file with one gene id per per line. When I
run the following script:
use Bio::DB::GFF;
use Bio::SeqIO;
my $out = Bio::SeqIO->new( -fh => \*STDOUT,
-format => 'fasta');
my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
-dsn => 'dbi:mysql:database=dmel_43_new');
while (<>){
chomp;
my $id = $_;
my @feats = $db->get_feature_by_name($id);
for my $f (@feats){
$out->write_seq( $f->seq ) if $f->type =~/gene/;
}
}
I get more sequence back than the number of gene in my input file. I double
check there. Some of the duplicated entries are the same, some are not!
I double check and none of the the duplicated entry in the output are having
duplicated "gene" entry in the original gff files that I just loaded in the
MySQL database using bp_bulk_load_gff.pl.
Any one as an idea as to what is going with my script?
Marco
On 8/11/06 12:05, "Scott Cain" <cain at cshl.edu> wrote:
> Hi Marco,
>
> What you are getting from get_feature_by_name is a list of
> Bio::DB::GFF::Feature objects, which are Bio::SeqFeatureI objects. What
> you need are Bio::PrimarySeq objects. Fortunately,
> Bio::DB::GFF::Feature has a method to get a PrimarySeq out of it; that
> method is called seq.
>
> So, you should be able to change your line to
>
> $out->write_seq( $_->seq() ) for @feat;
>
> and it should work. Of course, I haven't test that to make sure that it
> does :-)
>
> Scott
>
>
> On Fri, 2006-08-11 at 11:30 -0700, Marco Blanchette wrote:
>> Dear all,
>>
>> I used to use this very simple script to extract the gene sequence as a
>> fasta flat file from a Bio::DB::GFF database containing the GadFly 4.3
>> annotations
>>
>> #!/usr/bin/perl
>>
>> use strict;
>> use warnings;
>> use Bio::DB::GFF;
>> use Bio::SeqIO;
>>
>> my $out = Bio::SeqIO->new( -fh => \*STDOUT,
>> -format => 'fasta');
>>
>> my $db = Bio::DB::GFF->new( -adaptor => 'dbi::mysql',
>> -dsn => 'dbi:mysql:database=dmel_43_LS');
>>
>> while (<>){
>> chomp;
>> my @feat = $db->get_feature_by_name($_);
>> $out->write_seq($_) for @feat;
>> }
>>
>> Somehow I now get the following output instead of the actual sequences:
>>> FBgn0024988 gene:.(FBgn0024988)
>> Bio::PrimarySeq=HASH(0x19fd3d8)
>>> FBgn0041184 gene:.(FBgn0041184)
>> Bio::PrimarySeq=HASH(0x19fa684)
>>> FBgn0033636 gene:.(FBgn0033636)
>> Bio::PrimarySeq=HASH(0x19e1908)
>>
>> What change and what would be the right way to get what I want?
>>
>> Many thanks
>>
>> Marco
>> ______________________________
>> Marco Blanchette, Ph.D.
>>
>> mblanche at uclink.berkeley.edu
>>
>> Donald C. Rio's lab
>> Department of Molecular and Cell Biology
>> 16 Barker Hall
>> University of California
>> Berkeley, CA 94720-3204
>>
>> Tel: (510) 642-1084
>> Cell: (510) 847-0996
>> Fax: (510) 642-6062
______________________________
Marco Blanchette, Ph.D.
mblanche at uclink.berkeley.edu
Donald C. Rio's lab
Department of Molecular and Cell Biology
16 Barker Hall
University of California
Berkeley, CA 94720-3204
Tel: (510) 642-1084
Cell: (510) 847-0996
Fax: (510) 642-6062
--
More information about the Bioperl-l
mailing list