[BioPython] Re: translating genes with >1 exon
Andrew Dalke
dalke@acm.org
Mon, 10 Apr 2000 10:41:51 -0600
On the bioperl list, Keith James asked about translating
an exon:
> Get the PrimarySeq object of each subsequence
> Get the sequence string of each PrimarySeq object
> Join the sequence strings together
> Make a new PrimarySeq object from the string
> Translate that, making yet another PrimarySeq object
In the proposal code I've been working on, I added
__add__ (and I should add __radd__), so you can do
s1 = Seq("ATGCATCACAATCG", Alphabet.IUPAC.unambiguous_dna)
s2 = Seq("W", Alphabet.IUPAC.ambiguous_dna)
t = s1 + s2
and get that t is
Seq("ATGCATCACAATCGW", IUPACAmbiguousDNA())
Thus, translation of subsets is:
translate(seq[5:20] + seq[29:65] + seq[100:200])
Addition, as with the other code I've been working on, is
alphabet strict. In this case, the unambiguous_dna is
a proper subset of ambiguous_dna, so it got promoted. That
required a new method of alphabets, called "contain".
The default "contains" method is:
def contains(self, other):
return isinstance(self.__class__, other)
Some of the encodings, like the gap-character encoding, is
implemented like:
def contains(self, other):
if self.gap_char != other.gap_char:
return 0
return self.alphabet.contains(other.alphabet)
In theory, I could search for a base class in common. For
example, the base class of all Alphabets is "Alphabet", so
adding a protein and DNA alphabet could return the base class.
However:
1) the base class in common could be a pure abstract class
and not meant to be instanced
2) if there is multiple inheritence, I might not get the
"right" one
3) I want to have singleton alphabet definitions, if at all
possible.
Andrew
dalke@acm.org