[BioPython] [DETECTED AS SPAM] Re: back-translation method for Seq object?
Bruce Southey
bsouthey at gmail.com
Wed Oct 22 15:04:29 UTC 2008
Peter wrote:
> On Wed, Oct 22, 2008 at 9:31 AM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
>
>> On 21/10/2008 21:36, "Bruce Southey" <bsouthey at gmail.com> wrote:
>>
>>
>>> For completeness as these are not 100% correct,
>>> Leu/L =(TTA|TTG|CTT|CTC|CTA|CTG) = (TTN|CTR) = YTN
>>> Arg/R =(CGT|CGC|CGA|CGG|AGA|AGG) =(CGV | AGR) = MGV
>>> Ser/S =(TCT|TCC|TCA|TCG|AGT|AGC) =(TCN|AGY) = WSN
>>>
>
> I was going to jump up and down and disagree with you here Bruce, but
> Leighton has already made the same point, (CGV | AGR) != MGV etc.
> It is true that the ambiguous codon MGV would cover all the possible
> Arg codons, but it includes more than that. While this could be a
> useful thing for certain back-translation reasons, it does break the
> expectation that translate(back_translate(sequence)) == sequence
> [currently the behaviour available in Bio.Translate].
>
Leighton does show these are correct:
(CGV | AGR) == MGV
and MGV ==(CGV | AGR)
BUT I fully agree that MGV does stand for other other codons that are do
not translate for Arg as Leighton pointed out. This was why I prefixed
this by stating "these are not 100% correct" so I am sorry that I was
not clear enough. Yes, I am also very aware that this creates a problem
for doing a translate(back_translate(sequence)) without using a special
translation table (yet another reason for not including it in Seq object
or just return an exception).
As I pointed in your other thread that I do not believe that a
back-translation should be part of the Seq object. If for no other
reason than back-translation just creates too many ambiguous nucleotides
in one DNA sequence. This will cause some of the algorithms to determine
protein or DNA sequences to fail (back_translate('AFLFQPQRFGR') gives
'GCNTTYYTNTTYCARCCNCARMGVTTYGGNMGV', which causes NCBI's online BLASTN
to say it is protein). In anycase, BLAST and such are not very good at
handling multiple ambiguous nucleotides in a sequence when probably
one-third to one-half of the sequence would be ambiguous nucleotides.
Bruce
More information about the Biopython
mailing list