[Biopython-dev] Bio.Seq: implementing of translation of gapped sequences

Peter Cock p.j.a.cock at googlemail.com
Fri Dec 18 17:42:13 UTC 2015


I've just merged this, gapped translation now supported :)

https://github.com/biopython/biopython/commit/72c16670a15911c28f7555ba32f1eade56362587

Thanks Carlos!

Peter

On Tue, Nov 3, 2015 at 2:54 PM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> Hi all,
>
> The pull request from Carlos for gap codon translation is here:
>
> https://github.com/biopython/biopython/pull/661
>
> The proposed behaviour adds a gap argument to the translate
> method (should this be gap_char to match the alphabet object?),
> and will look at the alphabet by default.
>
> I've written up some examples here:
>
> https://github.com/biopython/biopython/pull/661#issuecomment-153376803
>
> There is one potentially surprising change to existing behaviour,
> previously the gap codon would always raise a translation error:
>
>>>> from Bio.Seq import Seq
>>>> from Bio.Alphabet import generic_dna, Gapped
>>>> Seq("ACT---TAA", Gapped(generic_dna)).translate()
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "Bio/Seq.py", line 1004, in translate
>     stop_symbol, to_stop, cds)
>   File "Bio/Seq.py", line 2064, in _translate_str
>     "Codon '{0}' is invalid".format(codon))
> Bio.Data.CodonTable.TranslationError: Codon '---' is invalid
>
> If the gap character is explicit via the alphabet this becomes:
>
>>>> from Bio.Seq import Seq
>>>> from Bio.Alphabet import generic_dna, Gapped
>>>> Seq("ACT---TAA", Gapped(generic_dna)).translate()
> Seq('T-*', HasStopCodon(Gapped(ExtendedIUPACProtein(), '-'), '*'))
>
> Or using a different gap character,
>
>>>> Seq("ACT~~~TAA", Gapped(generic_dna, "~")).translate()
> Seq('T~*', HasStopCodon(Gapped(ExtendedIUPACProtein(), '~'), '*'))
>
> I think this is a small change, and worth while overall for the
> useful functionality - something I was planning to add myself
> at some point but had never gotten round to.
>
> Thoughts? Feedback? You can use GitHub if you prefer:
>
> https://github.com/biopython/biopython/pull/661
>
>
> Regards,
>
> Peter
>
>
>
> On Tue, Nov 3, 2015 at 8:26 AM, Carlos Peña <mycalesis at gmail.com> wrote:
>> Hi all,
>>
>>
>> I have prepared a pull request to try implementing the translation of gapped
>> sequences (thanks Peter for guidelines!).
>>
>> The code will infer the gap character from the Seq object's given alphabet.
>> If the alphabet is not present if can optionally accept a gap character,
>> then it will return a protein sequence with gaps, otherwise it will raise a
>> Translation error.
>>
>> At least for me, the change will allow simplify the code of my projects.
>> However, this implementation might bite you if you are expecting a
>> TranslationError from trying to translate a gapped sequence. Instead you
>> will get back a gapped protein sequence (if you gap consists of dashes "-").
>>
>> Is this change desirable for the Biopython project? I noticed that the
>> scikit-bio project does not implement gapped translations:
>> https://github.com/biocore/scikit-bio but I don't know why.
>>
>>
>> cheers
>>
>>
>> carlos
>>
>>
>> Dr. Carlos Peña
>> Laboratory of Genetics
>> Department of Biology
>> University of Turku
>> 20014 Turku
>> FINLAND
>>
>> * Associate Editor: Revista peruana de Biología
>> http://is.gd/TwbW
>>
>> * The Nymphalidae Systematics Group
>> http://nymphalidae.utu.fi/db.php
>>
>> _______________________________________________
>> Biopython-dev mailing list
>> Biopython-dev at mailman.open-bio.org
>> http://mailman.open-bio.org/mailman/listinfo/biopython-dev



More information about the Biopython-dev mailing list