[Biopython] Bio.Motif search_pwm

Michiel de Hoon mjldehoon at yahoo.com
Wed Aug 1 05:14:52 UTC 2012


Hi everybody,

I was using the search_pwm method in Bio.Motif (which btw is very useful, thanks Bartek) to search for motif instances on both strands of a sequence. If the motif starts at position and is located on the forward strand, this function returns +position; if it is located on the reverse strand, it returns -position. So for position==0, we cannot deduce from the sign whether the motif is located on the forward or on the backward strand.

How about using Python-style negative indices to indicate the strand? For example, +20 means that the motif is located at [20:20+motif_length] on the forward strand, while -20 means that the motif is located at [-20:-20+motif_length].

Alternatively, we could return the strand explicitly.

In the same function, I wish we could get rid of this line:

sequence=sequence.tostring().upper()

since this assumes that sequence is a Biopython Seq object, and not a plain string. We could either use str(sequence) instead of sequence.tostring() to cover both cases, or have the Seq class inherit from strings (which we have been discussing for some time; see https://redmine.open-bio.org/issues/2351).

Best,
-Michiel.



More information about the Biopython mailing list