[BioPython] [DETECTED AS SPAM] Re: back-translation method for Seq object?
Bruce Southey
bsouthey at gmail.com
Tue Oct 21 20:36:31 UTC 2008
Peter wrote:
> Hi everyone,
>
> I think we all agree that if we want a back-translation
> method/function to return a simple string or Seq object (given no
> additional information about the codon use), this cannot fully capture
> all the possible codons.
>
For completeness as these are not 100% correct,
Leu/L =(TTA|TTG|CTT|CTC|CTA|CTG) = (TTN|CTR) = YTN
Arg/R =(CGT|CGC|CGA|CGG|AGA|AGG) =(CGV | AGR) = MGV
Ser/S =(TCT|TCC|TCA|TCG|AGT|AGC) =(TCN|AGY) = WSN
Ser is really so bad that one would suggest providing a strong warning
and just use NTN, NGN, and NNN for Leu, Arg and Ser, respectively.
> If we want to provide a simple string or Seq object, we can either
> pick an arbitrary codon in each case (as in the first attachment on
> Bug 2618), or perhaps represent some of the possible codons using
> ambiguous nucleotides.
>
> e.g.
> back_translate("MR") = "ATGCGT" #arbitrary codon for R unambiguous nucleotides
>
> or,
> back_translate("MR") = "ATGCGN" #arbitrary codon for R using ambiguous
> nucleotides
>
> Note in either example, the following nice property holds:
> translate(back_translate("MR")) == "MR"
>
> Even if improved by typical codon usage figures to give a more
> biologically likely answer, neither of these simple approaches covers
> the full set of six possible codons for Arg in the standard codon
> table.
>
> It was something like this that I envisioned as a candidate for a Seq
> method (based on the behaviour of the existing Bio.Translate
> functionality), but only if such a simple back_translate
> method/function had any real uses. And thus far, I haven't seen any.
>
For you perhaps but my reasons are very real to me!
> A back translation method/function which dealt with all the possible
> codon choices would have to use a more advanced representation
> (possibly as Bruce suggested using regular expressions or some sort of
> tree structure - ideally as a sub-class of the Seq object). There is
> also the option of returning multiple simple strings or Seq objects
> (either as a list or preferable a generator) giving all possible back
> translations, but I don't think this would be useful, except perhaps
> on small examples, due to the potentially vast number of return
> values.
>
> Peter
>
>
In any situation, we are left with a ambiguous codons, a regular
expression or some combination of sequence type (e.g., strings or Seq
objects). None of these options are fully compatible with the Seq
object. So I do agree that back-translation can not be part of the Seq
object. Also I agree that while first two could be return types for a
Seq object method, the usage is probably too infrequent and too
specialized for inclusion especially to handle codon usage frequencies.
Bruce
More information about the Biopython
mailing list