[EMBOSS] transeq and ambiguous codons

Peter biopython at maubp.freeserve.co.uk
Fri Jul 10 09:14:42 UTC 2009


On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice<pmr at ebi.ac.uk> wrote:
>
> Peter C. wrote:
>> However, consider the codon TRR. R means A or G, so this can mean TAA,
>> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
>> standard table agree here). Therefore the translation of TRR should be
>> "* or W", which I would expect based on the above examples to result
>> in "X". But instead EMBOSS transeq gives "*":
>
> This is a side effect of the way backtranslation works...

OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
way, but I think I follow your logic), I have some more problem cases for
you to consider (all using the default standard NCBI table 1).

Most of these are 'unambiguous ambiguous codons' as you put it, and
I would agree using X when a more specific letter is possible isn't ideal
but isn't actually wrong. The "ATS" and related codons (see below)
however are simply wrong.

--------------------------------------------------------------------------------------

TRA means TAA or TGA, which are both stop codons. Therefore TRA
should translate as a stop, not as an X:

$ transeq asis:TAATGATRA -stdout -auto -osformat raw
**X

--------------------------------------------------------------------------------------

Now look at YTA, which means CTA or TTA which encode L, so
YTA should be L not X:

$ transeq asis:CTATTAYTA -stdout -auto -osformat raw
LLX

Likewise for YTG and YTR, and YTN.

--------------------------------------------------------------------------------------

Another example, ATW means ATA or ATT, which both translate as I,
so ATW should translate as I not X:

$ transeq asis:ATAATTATW -stdout -auto -osformat raw
IIX

--------------------------------------------------------------------------------------

Conversely, ATS which means ATC or ATG which translate as I and M.
Remember S means G or C. Therefore ATS should translate as X, and
not I:

$ transeq asis:ATCATGATS -stdout -auto -osformat raw
IMI

Likewise H means A, G or C, so ATH shows the same bug, as do some
other AT* codons:

$ transeq asis:ATAATCATGATH -stdout -auto -osformat raw
IIMI

[*** This one strikes me as a clear bug ***]

--------------------------------------------------------------------------------------

Now for another debatable one, RAT means AAT or GAT which code
for N and D. So, you could use B (Asx) here rather than the broader X.

$ transeq asis:AATGATRAT -stdout -auto -osformat raw
NDX

Again, the same thing for others like RAC -> X not B, and RAY -> X not B.

Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and
opt for X (again, this is justifiable). e.g. WTA

$ transeq asis:ATATTAWTA -stdout -auto -osformat raw
ILX

--------------------------------------------------------------------------------------

This list is only partial, and only for the standard table.

Peter C.



More information about the EMBOSS mailing list