[Biopython] back-translation method for Seq object?
Jonathan Blakes
jvb at Cs.Nott.AC.UK
Wed Mar 9 15:33:28 UTC 2011
This is a reply to an old thread (October 2008), but I thought someone
might find it useful.
In that thread, discussing the representation of back-translations using
ambiguous bases to avoid the factorial explosion of an all possibilities
back-translation, Bruce Southey gave a table similar to the one below
but some of the ambiguous codons were incorrect or the ambiguous codons
were to ambiguous and covered more than one amino acid. The codons for
stop (*) were also missing. Some were corrected later in the thread but
not all.
Here are the correct ambiguous codons for the standard genetic code:
* = TAG, TAA, TGA = TAR, TGA
A = GCT, GCC, GCA, GCG = GCN
C = TGT, TGC = TGY
D = GAT, GAC = GAY
E = GAA, GAG = GAR
F = TTT, TTC = TTY
G = GGT, GGC, GGA, GGG = GGN
H = CAT, CAC = CAY
I = ATT, ATC, ATA = ATH
K = AAA, AAG = AAR
L = TTA, TTG, CTT, CTC, CTA, CTG = TTR, CTN
M = ATG = ATG
N = AAT, AAC = AAY
P = CCT, CCC, CCA, CCG = CCN
Q = CAA, CAG = CAR
R = CGT, CGC, CGA, CGG, AGA, AGG = CGN, AGR
S = TCT, TCC, TCA, TCG, AGT, AGC = TCN, AGY
T = ACT, ACC, ACA, ACG = ACN
V = GTT, GTC, GTA, GTG = GTN
W = TGG = TGG
Y = TAT, TAC = TAY
Even though this is still not a one-to-one mapping in 4/21 cases the
factorial explosion is significantly decreased. For example, the protein
ACDEFGHIKLMNPQRSTVWY* has 1,019,215,872 unambiguous back-translations.
Using the code above it has 16, or generally 2^(L+R+S+*).
If anyone has an algorithm for determining the set of non-overlapping
ambiguous codons from any codon table I would like to know. Thanks,
Jon
--
Jonathan Blakes
School of Computer Science
University of Nottingham
More information about the Biopython
mailing list