[Biopython-dev] [Bug 1963] Adding __str__ method to codon tables and translators

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sun Feb 26 10:35:32 EST 2006


http://bugzilla.open-bio.org/show_bug.cgi?id=1963





------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2006-02-26 10:35 -------
Revised version which:
* Uses the "conventional" nucleotide ordering
* Works for the ambigous tables
* Shows the table's ID and name(s)

Again, add this method to Bio/Data/CodonTable.py class
CodonTable:

    def __str__(self) :
        """Returns a simple text representation of the codon table"""
        if self.id :
            answer = "Table %i" % self.id
        else :
            answer = "Table ID unknown"
        if self.names :
            answer = answer + " " + ", ".join(filter(None, self.names))

        """
        #Use the conventional ordering for the codon table
        #and only use the main four - even for ambiguous tables
        letters = self.nucleotide_alphabet.letters
        if "T" in letters :
            #DNA
            letters = "TCAG"
        elif "U" in letters :
            #RNA
            letters = "UCAG"
        else :
            print "WARNING - Unexpected alphabet"
        """

        #Use the conventional ordering for the codon table
        letters = self.nucleotide_alphabet.letters
        if "GATC" == letters :
            #DNA
            letters = "TCAG"
        elif "GAUC" == letters :
            #RNA
            letters = "UCAG"


        answer=answer + "\n\n  |" + "|".join( \
            ["  %s      " % c2 for c2 in letters] \
            ) + "|"
        answer=answer + "\n--+" \
               + "+".join(["---------" for c2 in letters]) + "+--"
        for c1 in letters :
            for c3 in letters :
                line = c1 + " |"
                for c2 in letters :
                    codon = c1+c2+c3
                    line = line + " %s" % codon
                    if codon in self.stop_codons :
                        line = line + " Stop|"
                    else :
                        try :
                            amino = self.forward_table[codon]
                        except KeyError :
                            amino = "?"
                        except TranslationError :
                            amino = "?"
                        if codon in self.start_codons :
                            line = line + " %s(s)|" % amino
                        else :
                            line = line + " %s   |" % amino
                line = line + " " + c3
                answer = answer + "\n"+ line 
            answer=answer + "\n--+" \
                  + "+".join(["---------" for c2 in letters]) + "+--"
        return answer

Example:

>>> import Bio.Data.CodonTable
>>> print Bio.Data.CodonTable.unambiguous_dna_by_id[11]
Table 11 Bacterial

  |  T      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
T | TTT F   | TCT S   | TAT Y   | TGT C   | T
T | TTC F   | TCC S   | TAC Y   | TGC C   | C
T | TTA L   | TCA S   | TAA Stop| TGA Stop| A
T | TTG L(s)| TCG S   | TAG Stop| TGG W   | G
--+---------+---------+---------+---------+--
C | CTT L   | CCT P   | CAT H   | CGT R   | T
C | CTC L   | CCC P   | CAC H   | CGC R   | C
C | CTA L   | CCA P   | CAA Q   | CGA R   | A
C | CTG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | ATT I(s)| ACT T   | AAT N   | AGT S   | T
A | ATC I(s)| ACC T   | AAC N   | AGC S   | C
A | ATA I(s)| ACA T   | AAA K   | AGA R   | A
A | ATG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GTT V   | GCT A   | GAT D   | GGT G   | T
G | GTC V   | GCC A   | GAC D   | GGC G   | C
G | GTA V   | GCA A   | GAA E   | GGA G   | A
G | GTG V(s)| GCG A   | GAG E   | GGG G   | G
--+---------+---------+---------+---------+--
>>> print Bio.Data.CodonTable.unambiguous_rna_by_id[1]
Table 1 Standard, SGC0

  |  U      |  C      |  A      |  G      |
--+---------+---------+---------+---------+--
U | UUU F   | UCU S   | UAU Y   | UGU C   | U
U | UUC F   | UCC S   | UAC Y   | UGC C   | C
U | UUA L   | UCA S   | UAA Stop| UGA Stop| A
U | UUG L(s)| UCG S   | UAG Stop| UGG W   | G
--+---------+---------+---------+---------+--
C | CUU L   | CCU P   | CAU H   | CGU R   | U
C | CUC L   | CCC P   | CAC H   | CGC R   | C
C | CUA L   | CCA P   | CAA Q   | CGA R   | A
C | CUG L(s)| CCG P   | CAG Q   | CGG R   | G
--+---------+---------+---------+---------+--
A | AUU I   | ACU T   | AAU N   | AGU S   | U
A | AUC I   | ACC T   | AAC N   | AGC S   | C
A | AUA I   | ACA T   | AAA K   | AGA R   | A
A | AUG M(s)| ACG T   | AAG K   | AGG R   | G
--+---------+---------+---------+---------+--
G | GUU V   | GCU A   | GAU D   | GGU G   | U
G | GUC V   | GCC A   | GAC D   | GGC G   | C
G | GUA V   | GCA A   | GAA E   | GGA G   | A
G | GUG V   | GCG A   | GAG E   | GGG G   | G
--+---------+---------+---------+---------+--

Question One:
Is this worth adding to BioPython or not?

Question Two:
What is the preferred behaviour for ambiguous tables?  Just a 4x4x4 table as
for the unambiguous tables?  Or the full 15x15x15 table?  I have implemented
both (see commented out code)

Question Three:
Is there a standard BioPython function to convert from one letter amino acid
sequences into three letter names?  i.e. like one_to_three from
Bio.PDB.Polypeptide but more general.  That function does not cope with
ambigous names.




------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list