[EMBOSS] backtranseq
Peter Rice
pmr at ebi.ac.uk
Thu Jul 21 15:55:15 UTC 2005
Nadeem Faruque wrote:
> Josh Cherry wrote:
>>But this won't work the way some might hope due to the nature of the
>>genetic code, specifically (in the standard code) the three amino acids
>>that have six codons each (S, L, and R). Consider serine, encoded by UCN
>>and AGY. Would you like this to be back-translated to WSN? That matches
>>all six serine codons but also ten non-serine codons. Some people may
>>still want to use it in a probe or primer though.
>
> I was going to use Serine in my example but realised 'WSN' was a bit too degenerate to be a useful example.
> I understand you could not roundtrip peptide->DNA->peptide with my suggested behaviour
... I bet you can!!! Assuming you have a backtranslated sequence, WSN would
be surely Serine (as would UCN or AGY). If any of the 3 positions is more
specific, that could indicate one of the other possibilities.
I would be happy to accept a lower case residue if the result is uncertain (if
the ambiguity codes do not match what one would expect from the genetic code
in a backtranslation). For ASN the answer could be T (ACN) S (AGY) or R (AGR)
with T ('t') the favourite by a majority vote (4/4 codons match, 2/6 for the
others).
X can be used if all else fails. After all, we could be translating a sequence
with a SNP. A command line option can give the user a choice of trying to
resolve unclear positions or using X.
Degenerate codons would be:
A GCN
C UGY
D GAY
E GAR
F UUY
G GGN
H CAY
I AUH
K AAR
L YUN (CUN/UUR) - also matches F (UUY)
M AUG
N AAY
P CCN
Q CAR
R MGN (CGN/AGR) - also matches S (AGY)
S WSN (UCN/AGY) - also matches T (ACN)
also matches R (AGR)
also matches C and W and * (UGN)
T ACN
V GUN
W UGG
Y UAY
* URR - also matcheds W (UGG)
m NUG (start codon)
More information about the EMBOSS
mailing list