[Bioperl-l] Still trouble with remote joined records
Chris Fields
cjfields at uiuc.edu
Thu Oct 18 03:45:55 UTC 2007
On Oct 17, 2007, at 9:49 PM, Chris Fields wrote:
> ...
>> So, as far as I can see, passing the DB handle isn't causing the
>> spliced_seq method to go elsewhere for the nucleic acid sequence
>> data.
>>
>> I thought that was the purpose.
>>
>> Can anyone enlighten me, or is this a bug?
>>
>> Warren Gallin
>
> This appears to be a bug. The remote sequence is designated in the
> feature:
>
> CDS join
> (430..1544,AF166008.1:345..507,AF166008.1:8976..9071,
> AF166008.1:9952..10044,AF166008.1:13222..13469,
> AF166008.1:15123..15300)
> /gene="KCND2"
> /codon_start=1
> /product="voltage-gated potassium channel Kv4.2"
> /protein_id="AAF65618.1"
> /db_xref="GI:7648673"
>
> but the location is truncated when passed through SeqIO to genbank
> output, which explains the spliced_seq() problem:
>
> CDS 430..1544
> /db_xref="GI:7648673"
> /codon_start=1
> /protein_id="AAF65618.1"
> /gene="KCND2"
>
> I'll try looking into this; if I can't get to it immediately I'll
> file a bug report. Thanks!
>
> chris
Looked into it and there isn't a problem with BioPerl, but there
appears to be an error on NCBI's end with some full GenBank seqs and
remote locations. The default return type for records using
Bio::DB::GenBank is 'gbwithparts' (which retrieves full records for
everything), but this particular record version has a truncated
location. You can see the truncated version here via eutils:
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
db=nucleotide&retmode=text&id=7648671&rettype=gbwithparts
You can get around this in your script by changing the requested
format to 'gb', which has the correct location string and returns the
full protein seq:
my $gbh = Bio::DB::GenBank->new(-format => 'gb');
chris
More information about the Bioperl-l
mailing list