[BioPython] Re: translating genes with >1 exon

Mon, 10 Apr 2000 10:41:51 -0600

On the bioperl list, Keith James asked about translating
an exon:
> Get the PrimarySeq object of each subsequence
> Get the sequence string of each PrimarySeq object
> Join the sequence strings together
> Make a new PrimarySeq object from the string
> Translate that, making yet another PrimarySeq object

In the proposal code I've been working on, I added
__add__ (and I should add __radd__), so you can do

s1 = Seq("ATGCATCACAATCG", Alphabet.IUPAC.unambiguous_dna)
s2 = Seq("W", Alphabet.IUPAC.ambiguous_dna)
t = s1 + s2

and get that t is

Seq("ATGCATCACAATCGW", IUPACAmbiguousDNA())

Thus, translation of subsets is:
  translate(seq[5:20] + seq[29:65] + seq[100:200])

Addition, as with the other code I've been working on, is
alphabet strict.  In this case, the unambiguous_dna is
a proper subset of ambiguous_dna, so it got promoted.  That
required a new method of alphabets, called "contain".

The default "contains" method is:
  def contains(self, other):
    return isinstance(self.__class__, other)

Some of the encodings, like the gap-character encoding, is
implemented like:
  def contains(self, other):
    if self.gap_char != other.gap_char:
      return 0
    return self.alphabet.contains(other.alphabet)

In theory, I could search for a base class in common.  For
example, the base class of all Alphabets is "Alphabet", so
adding a protein and DNA alphabet could return the base class.
However:
  1) the base class in common could be a pure abstract class
    and not meant to be instanced
  2) if there is multiple inheritence, I might not get the
    "right" one
  3) I want to have singleton alphabet definitions, if at all
    possible.

                    Andrew
                    dalke@acm.org