[EMBOSS] transeq and ambiguous codons

Peter biopython at maubp.freeserve.co.uk
Thu Jul 22 11:36:04 UTC 2010


Hi again,

Now that I have installed the latest and greatest version, EMBOSS 6.3.1,
I'm revisiting some old issues I had with EMBOSS. In this case  'unambiguous
ambiguous codons' and other translation issues.

On Fri, Jul 10, 2009 at 10:14 AM, Peter C. wrote:
> On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice wrote:
>>
>> Peter C. wrote:
>>> However, consider the codon TRR. R means A or G, so this can mean TAA,
>>> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI
>>> standard table agree here). Therefore the translation of TRR should be
>>> "* or W", which I would expect based on the above examples to result
>>> in "X". But instead EMBOSS transeq gives "*":
>>
>> This is a side effect of the way backtranslation works...
>
> OK, leaving TRR aside for the moment (I'm not sure I'd have done it that
> way, but I think I follow your logic), I have some more problem cases for
> you to consider (all using the default standard NCBI table 1).
>
> Most of these are 'unambiguous ambiguous codons' as you put it, and
> I would agree using X when a more specific letter is possible isn't ideal
> but isn't actually wrong. The "ATS" and related codons (see below)
> however are simply wrong.
>
> --------------------------------------------------------------------------------------
>
> TRA means TAA or TGA, which are both stop codons. Therefore TRA
> should translate as a stop, not as an X:
>
> $ transeq asis:TAATGATRA -stdout -auto -osformat raw
> **X

Same on EMBOSS 6.3.1, shouldn't TRA translate as stop?

> --------------------------------------------------------------------------------------
>
> Now look at YTA, which means CTA or TTA which encode L, so
> YTA should be L not X:
>
> $ transeq asis:CTATTAYTA -stdout -auto -osformat raw
> LLX

Same on EMBOSS 6.3.1, giving X instead of specific amino acid
(i.e. YTA is an "unambiguous ambiguous codon" for L)

> Likewise for YTG and YTR, and YTN.

I haven't re-checked these.

> --------------------------------------------------------------------------------------
>
> Another example, ATW means ATA or ATT, which both translate as I,
> so ATW should translate as I not X:
>
> $ transeq asis:ATAATTATW -stdout -auto -osformat raw
> IIX

Same on EMBOSS 6.3.1, giving X instead of specific amino acid
(i.e. ATW is an "unambiguous ambiguous codon" for I)

> --------------------------------------------------------------------------------------
>
> Conversely, ATS which means ATC or ATG which translate as I and M.
> Remember S means G or C. Therefore ATS should translate as X, and
> not I:
>
> $ transeq asis:ATCATGATS -stdout -auto -osformat raw
> IMI

Same on EMBOSS 6.3.1, giving potentially wrong amino acid instead of X.

> Likewise H means A, G or C, so ATH shows the same bug, as do some
> other AT* codons:
>
> $ transeq asis:ATAATCATGATH -stdout -auto -osformat raw
> IIMI
>
> [*** This one strikes me as a clear bug ***]

Same on EMBOSS 6.3.1, giving potentially wrong amino acid instead of X.

As I noted before, this list is only partial, and only for the standard table.
I could compile a much longer list of oddities using the Biopython
translation as a reference if you wanted.

Regards,

Peter C.



More information about the EMBOSS mailing list