[Bioperl-l] How does bioperl handle Genbank sequence records?
Danny Yoo
dyoo@acoma.Stanford.EDU
Thu, 25 Jul 2002 15:37:02 -0700 (PDT)
Hi everyone,
I'm using Bioperl 1.0 to parse out some sequences from GenBank. In
particular, I've been parsing:
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=nucleotide&list_uids=13449290&dopt=GenBank
As I was going through the 'RPL2' gene and cds feature, I noticed a
discrepency in sequence lengths. Here's the relevant coordinate records
in Genbank:
gene 154744..157345
/gene="rpl2"
mRNA join(154744..155660,157213..157345)
/gene="rpl2"
CDS join(154744..155660,157213..157345)
/gene="rpl2"
How does Bioperl handle the 'join(154744..15660,157213..157345)'
description when extracting sequences?
Here's a test program I wrote to double check things:
######
use strict;
use Bio::DB::GenBank;
sub getRPL2 {
my $gb = Bio::DB::GenBank->new();
my $seqobj = $gb->get_Seq_by_acc('NC_001284');
for my $feat ($seqobj->top_SeqFeatures()) {
if ($feat->primary_tag eq 'CDS'
&& $feat->gff_string =~ /gene "?rpl2"?/) {
return $feat;
}
}
}
my $rpl2 = getRPL2();
my $rpl2_seq = $rpl2->seq->seq;
print length($rpl2_seq), "\n";
######
When I run this program:
###
$ perl test_genbank.pl
-------------------- WARNING ---------------------
MSG: [NC_001284] is not a normal sequence database but a RefSeq entry.
Redirecting the request.
---------------------------------------------------
-------------------- WARNING ---------------------
MSG: Unbalanced quote in:
/gene="orf153a"
/codon_start=1
/protein_id="NP_085474.1"
/db_xref="GI:13449291"
/db_xref="SPTR
/translation="MSLLFQQTVPLSHLHRSLDPPLCFRTHILLILLLLSRHLPGFTG
SDCESADPSIVSAIAPGTATTSERDCPVRTAGSDPVPIGDSGTFFDVGTAAPELLSPN
RHHMITRAKDGIRKPNPRYNLFTQKYTPSEPKTITSASQDGDKLCKKRCRH"
No further qualifiers will be added for this feature
---------------------------------------------------
2602
###
I get a sequence of length 2602. Am I using this correctly?
It appears that Bioperl is grabbing the sequence from '154744..157345'
(whose length is 157345 - 154744 + 1), rather than what I had expected
from 'join(154744..155660,157213..157345)' of length 1051.
Thank you for any help you can give about this.