[Bioperl-l] question of seq() method of Bio::DB::GFF and Bio::PrimarySeq

Zhan, Shuai Shuai.Zhan at umassmed.edu
Thu Jan 27 05:21:41 UTC 2011


Dear BIOPERL developers,

I'm a post doc in UMASS Medical School.

Recently I carried out some genome-wide annotation work, but faced some difficulty with using Bio::DB::GFF to fetch DNA sequences, whose seq() method is required by another critical softwares.

I found the key issue is that the method of seq() of Bio::DB::GFF (inherited from Bio::PrimarySeq?) can't exactly return the actual sequence string but something like reference. This question made me upset for several days and stoped my progress, but I have not found any related help from google or my friends. So I'm assuming you could give me some valuable information.

My perl is 5.8.8 and system is CentOS 5.3.
I loaded gff file with annotation information both of CDS and reference by bp_load_gff.pl. The raw fasta sequence file was also loaded together.

I wrote a test script just like your instruction on website.
[code]
use Bio::DB::GFF;
my $db = Bio::DB::GFF->new(-dsn => 'dbi:mysql:brenneri', -user => 'me', -password => '123',);
my $contig0 = $db->segment(-class => 'Sequence', -name => 'Contig0');
my $subcontig = $contig0->subseq(1,100);
print "returned by seq(): ", $upstream->seq, "\n";
print "returned by dna(): ", $upstream->dna, "\n\n";
my $transcripts = $db->segment(-class => 'Transcript', -name => 'fgp17221.t1');
my @exons = $transcripts->features('CDS');
print "returned by seq(): ", $exons[0]->seq "\n";
print "returned by seq(): ", $exons[0]->dna "\n\n";
my $upstream = $exons[0]->subseq(-20,0);
print "returned by seq(): ", $upstream->seq "\n";
print "returned by seq(): ", $upstream->dna "\n\n";
[/code]

The result as follows:
$perl test.pl
returned by seq(): Bio::PrimarySeq=HASH(0x1afa40d0)
returned by dna(): GTAGATCACTTTTTATTCTCGAAGAATAATTTTTGAGCTATGTAAAAGCGTGTATACCTCCCAGTTTTCCGGTTAAAAGTCCAGGATCGCATGTCTCATG

returned by seq(): Bio::PrimarySeq=HASH(0x1afa3ab0)
returned by dna(): AAGGAGCTCAACTTCAAACACCAAACACTTTGAACCAACATTTTTCTGGAGAAAACGTAAGAATTGAAAAGTCAAGAATGAATCCACATGTGAAAATTGAACAGAGAT
CAATGGCAATGGGTGGAGGAGGAGGGGATGAACAAATGAAAAACTTTACGGAAATGACGAATGAAGAACTCCGTGAGCGGTTAATGAAAATGCAAATGGATATGCAGAATCTTC
AAATGGCAATGG

returned by seq(): Bio::PrimarySeq=HASH(0x1afa3fb0)
returned by dna(): GGAGTTTGGACCGTGTTTCAG

Because the method of dna() could exactly return the actual sequence, so I think my loaded database should work. But I have no idea of why the method seq() missed the sequence.

I'd greatly appreciate any help.

Sincerly,
Shuai
Jan 27 2010

364 Plantation Str.
MA 01605
UMASS MED




More information about the Bioperl-l mailing list