[EMBOSS] backtranseq

Peter Rice pmr at ebi.ac.uk
Thu Jul 21 15:55:15 UTC 2005


Nadeem Faruque wrote:
> Josh Cherry wrote:
>>But this won't work the way some might hope due to the nature of the
>>genetic code, specifically (in the standard code) the three amino acids
>>that have six codons each (S, L, and R).  Consider serine, encoded by UCN
>>and AGY.  Would you like this to be back-translated to WSN?  That matches
>>all six serine codons but also ten non-serine codons.  Some people may
>>still want to use it in a probe or primer though.
> 
> I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example.
> I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour

... I bet you can!!!  Assuming you have a backtranslated sequence, WSN would 
be surely Serine (as would UCN or AGY). If any of the 3 positions is more 
specific, that could indicate one of the other possibilities.

I would be happy to accept a lower case residue if the result is uncertain (if 
the ambiguity codes do not match what one would expect from the genetic code 
in a backtranslation). For ASN the answer could be T (ACN) S (AGY) or R (AGR) 
with T ('t') the favourite by a majority vote (4/4 codons match, 2/6 for the 
others).

X can be used if all else fails. After all, we could be translating a sequence 
with a SNP. A command line option can give the user a choice of trying to 
resolve unclear positions or using X.

Degenerate codons would be:

A GCN
C UGY
D GAY
E GAR
F UUY
G GGN
H CAY
I AUH
K AAR
L YUN (CUN/UUR) - also matches F (UUY)
M AUG
N AAY
P CCN
Q CAR
R MGN (CGN/AGR) - also matches S (AGY)
S WSN (UCN/AGY) - also matches T (ACN)
                   also matches R (AGR)
                   also matches C and W and * (UGN)

T ACN
V GUN
W UGG
Y UAY
* URR - also matcheds W (UGG)
m NUG (start codon)




More information about the EMBOSS mailing list