[Biopython-dev] [Bug 2783] Using alternative start codons in Bio.Seq translate method/function

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Mon May 11 12:40:49 UTC 2009


------- Comment #5 from biopython-bugzilla at maubp.freeserve.co.uk  2009-05-11 08:40 EST -------
Created an attachment (id=1298)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1298&action=view)
Patch for Bio/Seq.py to support complete CDS translation with non-standard
start codons

I've recently been doing CDS translations for viral/bacterial genes with
alternative start codons - and would like to fix this limitation in Biopython,
rather than having to hack around it.

On Bug 2381, comment #14, I wrote:
> For comparison, the following is copied from the BioPerl documentation about
> their sequence object's translate method.  It would be nice to follow some of
> the same naming conventions for any optional arguments.
> http://www.bioperl.org/Core/Latest/bptutorial.html#iii_3_1_manipulating_sequence_data_with_seq_methods
> If we want to translate full coding regions (CDS) the way major nucleotide
> databanks EMBL, GenBank and DDBJ do it, the translate() method has to perform
> more checks. Specifically, translate() needs to confirm that the sequence has
> appropriate start and terminator codons at the very beginning and the very end
> of the sequence and that there are no terminator codons present within the
> sequence in frame 0. In addition, if the genetic code being used has an
> atypical (non-ATG) start codon, the translate() method needs to convert the
> initial amino acid to methionine. These checks and conversions are triggered
> by setting ``complete'' to 1:
>   $prot_obj = $my_seq_object->translate(-complete => 1);

On Bug 2381, comment #51, Leighton wrote:
> In terms of nomenclature:
> The default behaviour of translate() as Peter proposed: read through in-frame
> and translate with the appropriate codon table - is fine in nearly all
> circumstances.  Most other circumstances are covered by stopping at the first
> in-frame stop codon, which Peter has implemented, and is an option we all seem
> to agree on.
> Biologically-speaking, this behaviour is not always correct for CDS in
> prokaryotes, where alternative start codons may occur a significant minority
> of the time.  These will be mistranslated if no provision is made for them.  I
> think a useful biological sequence object should at least try to mimic actual
> biology, so we should provide an option to handle this.
> We should not assume that a sequence is a CDS unless it is specified by the
> user.  It seems reasonable to me that the term 'cds' should occur in any such
> argument from the user.
> We have at least two options for how to proceed with a CDS: i) we can provide
> a strict CDS-type translation, which requires confirmation that the sequence
> is, in fact, a CDS; ii) we can provide a weak CDS-type translation, which only
> modifies the way the start codon is translated.  In both cases, behaviour is
> specific to CDS, and so having 'cds' in the argument name *somewhere* seems
> obvious, and entirely reasonable.

Leighton's option (ii) is start codon only modification.  This is what I
implemented in the patch on comment 1 (attachment 1259).  We haven't agreed on
a good name for this - which is partly why I went back to revisit the

Leighton's option (i) is strict CDS-type translation.  As Leighton suggests,
having "cds" in the argument name here makes sense.  Regarding the BioPerl
argument name for this functionality, "complete", on Bug 2381 comment 19,
Martin wrote:
> The "complete" is a cryptic naming, I wouldn't be fond of it.

I think you are both right about the naming.  Would complete_cds=True would be
clear?  In fact, I quite like the idea of using cds=True which is short and
also fairly clear.  This patch adds a complete_cds=Boolean argument to the
Bio.Seq translate methods and function, which should act like the BioPerl
equivalent.  It includes doctests showing the new functionality.

I would like to use either of these approaches in Biopython - but not both ;)

Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list