[Bioperl-l] Still trouble with remote joined records

Chris Fields cjfields at uiuc.edu
Thu Oct 18 03:45:55 UTC 2007


On Oct 17, 2007, at 9:49 PM, Chris Fields wrote:

> ...
>> So, as far as I can see, passing the DB handle isn't causing the
>> spliced_seq method to go elsewhere for the nucleic acid sequence  
>> data.
>>
>> I thought that was the purpose.
>>
>> Can anyone enlighten me, or is this a bug?
>>
>> Warren Gallin
>
> This appears to be a bug.  The remote sequence is designated in the
> feature:
>
>       CDS             join
> (430..1544,AF166008.1:345..507,AF166008.1:8976..9071,
>                       AF166008.1:9952..10044,AF166008.1:13222..13469,
>                       AF166008.1:15123..15300)
>                       /gene="KCND2"
>                       /codon_start=1
>                       /product="voltage-gated potassium channel Kv4.2"
>                       /protein_id="AAF65618.1"
>                       /db_xref="GI:7648673"
>
> but the location is truncated when passed through SeqIO to genbank
> output, which explains the spliced_seq() problem:
>
>       CDS             430..1544
>                       /db_xref="GI:7648673"
>                       /codon_start=1
>                       /protein_id="AAF65618.1"
>                       /gene="KCND2"
>
> I'll try looking into this; if I can't get to it immediately I'll
> file a bug report.  Thanks!
>
> chris

Looked into it and there isn't a problem with BioPerl, but there  
appears to be an error on NCBI's end with some full GenBank seqs and  
remote locations.  The default return type for records using  
Bio::DB::GenBank is 'gbwithparts' (which retrieves full records for  
everything), but this particular record version has a truncated  
location.  You can see the truncated version here via eutils:

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? 
db=nucleotide&retmode=text&id=7648671&rettype=gbwithparts

You can get around this in your script by changing the requested  
format to 'gb', which has the correct location string and returns the  
full protein seq:

my $gbh = Bio::DB::GenBank->new(-format => 'gb');

chris



More information about the Bioperl-l mailing list