[Biopython-dev] Alphabet bug in Bio.Motif and Bio.motifs

Wed Jun 5 10:12:38 UTC 2013

> I wouldn't want to subclass sets due to the fact that in many
> existing uses of the alphabets the order of the letters is
> important (and this is not specified in a Python set).

OK, then indeed a set wouldn't be appropriate.

> But I agree that a rationalised alphabet system like that could
> work better. Here equality testing could be on both being the
> same type, e.g. DNA, and having the same letters - including
> special letters for gaps or stop codons (which are the nastiest
> part of the current alphabet object system)?

I guess that it depends on how the alphabet is used. For example, for the example in the bug report the order of the letters doesn't matter, but for other cases it may matter.
Personally I almost never use alphabets. Can anybody give some real-life examples of how they are used?

Best,
-Michiel

________________________________
From: Peter Cock <p.j.a.cock at googlemail.com>
To: Bartek Wilczynski <barwil at gmail.com> 
Cc: Michiel de Hoon <mjldehoon at yahoo.com>; Biopython-Dev Mailing List <biopython-dev at biopython.org> 
Sent: Wednesday, June 5, 2013 6:32 PM
Subject: Re: Alphabet bug in Bio.Motif and Bio.motifs

On Wed, Jun 5, 2013 at 9:13 AM, Bartek Wilczynski <barwil at gmail.com> wrote:
> I'm a bit out of the loop here, but to me it seems like a simple issue:
>
> Why not change the problematic code:
>
>  if self.alphabet!=IUPAC.unambiguous_dna:
>         raise ValueError("Wrong alphabet! Use only with DNA motifs")
>
> into:
>
>  if type(self.alphabet)!=type(IUPAC.unambiguous_dna):
>         raise ValueError("Wrong alphabet! Use only with DNA motifs")
>
> and worry about fixing the Bio.Alphabet issues later (it does sound
> reasonable to make sure that any alphabet instance is a singleton).
>
> best
> Bartek

I would prefer a more duck-typing approach (is it DNA? Does it use the
expected set of letters?), but that sounds practical. Could you try using
isinstance instead though (see PEP8), and then make that fix with a
new unit test based on the original query please?

> On Wed, Jun 5, 2013 at 4:28 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>> Hi Peter,
>>
>> I have never quite understood why we need a separate class for each
>> alphabet.
>> I would think that a single alphabet class (or maybe a DNA, an RNA, and a
>> protein alphabet class) is sufficient, and that the specific alphabets are
>> instances of this class.
>> Also, alphabets are essentially sets of letters, so an Alphabet class should
>> inherit from set, allowing us to use its associated methods to compare
>> alphabets to each other.
>>
>> Best,
>> -Michiel.

I wouldn't want to subclass sets due to the fact that in many
existing uses of the alphabets the order of the letters is
important (and this is not specified in a Python set).

But I agree that a rationalised alphabet system like that could
work better. Here equality testing could be on both being the
same type, e.g. DNA, and having the same letters - including
special letters for gaps or stop codons (which are the nastiest
part of the current alphabet object system)?

Peter