[Bioperl-l] Still trouble with remote joined records
Chris Fields
cjfields at uiuc.edu
Thu Oct 18 02:49:37 UTC 2007
On Oct 17, 2007, at 8:42 PM, Warren Gallin wrote:
> I must be missing something, but I can not get the procedure outlined
> in FAQ 5.5 to do what I think it should (maybe my expectations are
> incorrect.
>
> I think I have an up-to-date release:
>
> GallinPowerbook:~ wgallin$ perl -MBio::Root::Version -e 'print
> $Bio::Root::Version::VERSION,"\n"'
> 1.005002102
> ...
> I get an error that a protein sequence can not be translated. I
> thought that by feeding the handle to GenBank into the spliced_seq
> method that it would retrieve the necessary nucleic acid sequence
> records and splice together the specified ranges.
No, the error is expected:
MSG: Can't translate an amino acid sequence.
The record in question is a protein record, so you are retrieving a
protein sequence, which can't be translated (the exception is valid,
in other words). Note that the 'CDS' feature in this case has a
specified location of 1..630 (indicated by arrow):
CDS 1..630 <---------
/gene="KCND2"
/coded_by="join
(AF166007.1:430..1544,AF166008.1:345..507,
AF166008.1:8976..9071,AF166008.1:9952..10044,
AF166008.1:13222..13469,AF166008.1:15123..15300)"
The tag name 'coded_by' has the data you want; however it is stored
as a string only.
> So I tried using the corresponding nucleic acid record, gi7648671,
> which holds the 5' end of the CDS ( I used $gbh on the get_Seq step).
>
> That yielded the correct amino acid sequence for the first half the
> protein, encoded by the sequence in the record itself, but it did not
> retrieve the other nucleic acid record that is specified to contain
> the 3' end of the sequence.
> ...
> So, as far as I can see, passing the DB handle isn't causing the
> spliced_seq method to go elsewhere for the nucleic acid sequence data.
>
> I thought that was the purpose.
>
> Can anyone enlighten me, or is this a bug?
>
> Warren Gallin
This appears to be a bug. The remote sequence is designated in the
feature:
CDS join
(430..1544,AF166008.1:345..507,AF166008.1:8976..9071,
AF166008.1:9952..10044,AF166008.1:13222..13469,
AF166008.1:15123..15300)
/gene="KCND2"
/codon_start=1
/product="voltage-gated potassium channel Kv4.2"
/protein_id="AAF65618.1"
/db_xref="GI:7648673"
but the location is truncated when passed through SeqIO to genbank
output, which explains the spliced_seq() problem:
CDS 430..1544
/db_xref="GI:7648673"
/codon_start=1
/protein_id="AAF65618.1"
/gene="KCND2"
I'll try looking into this; if I can't get to it immediately I'll
file a bug report. Thanks!
chris
More information about the Bioperl-l
mailing list