[Bioperl-l] question of seq() method of Bio::DB::GFF and Bio::PrimarySeq
Zhan, Shuai
Shuai.Zhan at umassmed.edu
Thu Jan 27 05:21:41 UTC 2011
Dear BIOPERL developers,
I'm a post doc in UMASS Medical School.
Recently I carried out some genome-wide annotation work, but faced some difficulty with using Bio::DB::GFF to fetch DNA sequences, whose seq() method is required by another critical softwares.
I found the key issue is that the method of seq() of Bio::DB::GFF (inherited from Bio::PrimarySeq?) can't exactly return the actual sequence string but something like reference. This question made me upset for several days and stoped my progress, but I have not found any related help from google or my friends. So I'm assuming you could give me some valuable information.
My perl is 5.8.8 and system is CentOS 5.3.
I loaded gff file with annotation information both of CDS and reference by bp_load_gff.pl. The raw fasta sequence file was also loaded together.
I wrote a test script just like your instruction on website.
[code]
use Bio::DB::GFF;
my $db = Bio::DB::GFF->new(-dsn => 'dbi:mysql:brenneri', -user => 'me', -password => '123',);
my $contig0 = $db->segment(-class => 'Sequence', -name => 'Contig0');
my $subcontig = $contig0->subseq(1,100);
print "returned by seq(): ", $upstream->seq, "\n";
print "returned by dna(): ", $upstream->dna, "\n\n";
my $transcripts = $db->segment(-class => 'Transcript', -name => 'fgp17221.t1');
my @exons = $transcripts->features('CDS');
print "returned by seq(): ", $exons[0]->seq "\n";
print "returned by seq(): ", $exons[0]->dna "\n\n";
my $upstream = $exons[0]->subseq(-20,0);
print "returned by seq(): ", $upstream->seq "\n";
print "returned by seq(): ", $upstream->dna "\n\n";
[/code]
The result as follows:
$perl test.pl
returned by seq(): Bio::PrimarySeq=HASH(0x1afa40d0)
returned by dna(): GTAGATCACTTTTTATTCTCGAAGAATAATTTTTGAGCTATGTAAAAGCGTGTATACCTCCCAGTTTTCCGGTTAAAAGTCCAGGATCGCATGTCTCATG
returned by seq(): Bio::PrimarySeq=HASH(0x1afa3ab0)
returned by dna(): AAGGAGCTCAACTTCAAACACCAAACACTTTGAACCAACATTTTTCTGGAGAAAACGTAAGAATTGAAAAGTCAAGAATGAATCCACATGTGAAAATTGAACAGAGAT
CAATGGCAATGGGTGGAGGAGGAGGGGATGAACAAATGAAAAACTTTACGGAAATGACGAATGAAGAACTCCGTGAGCGGTTAATGAAAATGCAAATGGATATGCAGAATCTTC
AAATGGCAATGG
returned by seq(): Bio::PrimarySeq=HASH(0x1afa3fb0)
returned by dna(): GGAGTTTGGACCGTGTTTCAG
Because the method of dna() could exactly return the actual sequence, so I think my loaded database should work. But I have no idea of why the method seq() missed the sequence.
I'd greatly appreciate any help.
Sincerly,
Shuai
Jan 27 2010
364 Plantation Str.
MA 01605
UMASS MED
More information about the Bioperl-l
mailing list