[Biopython-dev] Bio.Motif Suggestions

Peter biopython at maubp.freeserve.co.uk
Mon Apr 20 14:35:15 UTC 2009


On Mon, Apr 20, 2009 at 2:55 PM, Dave Bridges <dave.bridges at gmail.com> wrote:
>
>> > Is there an alphabet that accepts spaces which might be necessary for
>> > correct alignment of a motif, and if so will that work with the rest of
>> > motif.py?
>>
>
> That's a tougher one. It wasn't really needed so far (DNA motifs
> rarely have spaces), but I guess that for protein motifs it's a very
> important thing.
> I have some code for doing that, but I will need to find it. I'll
> write you later about it.
>

What would a space in a motif mean?  Clearly something different from
a wildcard like N or X in nucleotide or protein sequences.  Does it
mean a gap of variable length?  If it means a gap of one character
then surely just using a "-" would be sensible (as used in multiple
sequence alignments), for which we have a gapped alphabet system
setup.

Note that there are some issues with the current Bio.Motif code and
alphabets, which should be addressed.  For example, generic alphabets
don't have a letters property giving the list of expected letters, so
using set() on the sequences themselves might be more appropriate in
places.

Peter



More information about the Biopython-dev mailing list