[Bioperl-l] Still trouble with remote joined records

Thu Oct 18 02:49:37 UTC 2007

On Oct 17, 2007, at 8:42 PM, Warren Gallin wrote:

> I must be missing something, but I can not get the procedure outlined
> in FAQ 5.5 to do what I think it should (maybe my expectations are
> incorrect.
>
> I think I have an up-to-date release:
>
> GallinPowerbook:~ wgallin$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
> ...
> I get an error that a protein sequence can not be translated.  I
> thought that by feeding the handle to GenBank into the spliced_seq
> method that it would retrieve the necessary nucleic acid sequence
> records and splice together the specified ranges.

No, the error is expected:

MSG: Can't translate an amino acid sequence.

The record in question is a protein record, so you are retrieving a  
protein sequence, which can't be translated (the exception is valid,  
in other words).  Note that the 'CDS' feature in this case has a  
specified location of 1..630 (indicated by arrow):

      CDS             1..630   <---------
                      /gene="KCND2"
                      /coded_by="join 
(AF166007.1:430..1544,AF166008.1:345..507,
                      AF166008.1:8976..9071,AF166008.1:9952..10044,
                      AF166008.1:13222..13469,AF166008.1:15123..15300)"

The tag name 'coded_by' has the data you want; however it is stored  
as a string only.

> So I tried using the corresponding nucleic acid record, gi7648671,
> which holds the 5' end of the CDS ( I used $gbh on the get_Seq step).
>
> That yielded the correct amino acid sequence for the first half the
> protein, encoded by the sequence in the record itself, but it did not
> retrieve the other nucleic acid record that is specified to contain
> the 3' end of the sequence.
> ...
> So, as far as I can see, passing the DB handle isn't causing the
> spliced_seq method to go elsewhere for the nucleic acid sequence data.
>
> I thought that was the purpose.
>
> Can anyone enlighten me, or is this a bug?
>
> Warren Gallin

This appears to be a bug.  The remote sequence is designated in the  
feature:

      CDS             join 
(430..1544,AF166008.1:345..507,AF166008.1:8976..9071,
                      AF166008.1:9952..10044,AF166008.1:13222..13469,
                      AF166008.1:15123..15300)
                      /gene="KCND2"
                      /codon_start=1
                      /product="voltage-gated potassium channel Kv4.2"
                      /protein_id="AAF65618.1"
                      /db_xref="GI:7648673"

but the location is truncated when passed through SeqIO to genbank  
output, which explains the spliced_seq() problem:

      CDS             430..1544
                      /db_xref="GI:7648673"
                      /codon_start=1
                      /protein_id="AAF65618.1"
                      /gene="KCND2"

I'll try looking into this; if I can't get to it immediately I'll  
file a bug report.  Thanks!

chris