[Biopython-dev] [Bug 2381] translate and transcibe methods for the Seq object (in Bio.Seq)

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Thu Nov 6 15:07:06 UTC 2008


http://bugzilla.open-bio.org/show_bug.cgi?id=2381





------- Comment #43 from bsouthey at gmail.com  2008-11-06 10:07 EST -------
(In reply to comment #39)
> Created an attachment (id=1040)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1040&action=view) [details]
> Patch to Bio/Seq.py for complete CDS translation.
> 
> (In reply to comment #33)
> > Instead of the "init" start codon option in attachment 1032 [details],
> > I'd also be happy with a single boolean argument which does
> > start codon validation, treats this as a methionine, checks
> > the sequence is a multiple of three in length, checks for a
> > final stop codon, and checks for no additional stop codons.
> > We'd ruled out calling this "complete", but maybe "cds"
> > would be better?
> 
> This patch adds this functionality via a "complete_cds" boolean argument.
> 
> Here is how it could be applied to translate the CDS used as an example in my
> comment 35, the yaaX gene in E. coli K12:
> 
> >>> from Bio.Seq import Seq
> >>> my_cds = Seq("GTGAAAAAGATGCAATCTATCGTACTCGCACTTTCCCTGGTTCTGGTCGCTCCCATGGCAGCACAGGCTGCGGAAATTACGTTAGTCCCGTCAGTAAAATTACAGATAGGCGATCGTGATAATCGTGGCTATTACTGGGATGGAGGTCACTGGCGCGACCACGGCTGGTGGAAACAACATTATGAATGGCGAGGCAATCGCTGGCACCTACACGGACCGCCGCCACCGCCGCGCCACCATAAGAAAGCTCCTCATGATCATCACGGCGGTCATGGTCCAGGCAAACATCACCGCTAA")
> >>> my_cds.translate(table=11)
> Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HR*',
> HasStopCodon(ExtendedIUPACProtein(), '*'))
> >>> my_cds.translate(table=11, to_stop=True)
> Seq('VKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
> ExtendedIUPACProtein())
> >>> my_cds.translate(table=11, complete_cds=True)
> Seq('MKKMQSIVLALSLVLVAPMAAQAAEITLVPSVKLQIGDRDNRGYYWDGGHWRDH...HHR',
> ExtendedIUPACProtein())
> 
> I would be happy with EITHER of these options, as both can be used to translate
> a complete coding sequence:
> 
> (1) the "init" argument (under another name, maybe "cds_start"?) illustrated in
> attachment 1032 [details].  This would check the start codon is valid AND translate it as
> a methionine.
> 
> (2) the "complete_cds" argument (perhaps under another name, maybe "cds"?)
> illustrated in this patch.  This would check the start codon is valid AND
> translate it as a methionine AND check there are a whole number of codons AND
> check it ends with a stop codon AND check there are no extra in-frame stop
> codons.
> 


I support (1) but strongly disagree with (2) because 'cds' refers to a complete
DNA sequence not just if the sequence starts with M.
http://www.yeastgenome.org/help/glossary.html
"CDS:    CoDing Sequence, region of nucleotides that corresponds to the
sequence of amino acids in the predicted protein. The CDS includes start and
stop codons, therefore coding sequences begin with an "ATG" and end with a stop
codon. In SGD, unexpressed sequences, including the 5'-UTR, the 3'-UTR,
introns, or bases not expressed due to frameshifting, are not included within a
CDS. Note that the CDS does not correspond to the actual mRNA sequence."

However, I do like being able to obtain the translation of the actual CDS -
just not here.

I do not support the name 'init' because of reasons discussed. 

I do not support the name 'cds_start' because of the DNA interpretation and
that many Genbank records include the upstream and downstream non-coding
regions. In such cases, I would have to find the actual start codon, then I
might as well do the translation after that start codon than rely on a check
that might be wrong.

Perhaps some variant of:
a) Similar cases in Python:
has_met or has_met1
get_met or get_met1
b) More direct meaning:
starts_with_methionine, starts_with_met, starts_with_m


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list