[Biopython-dev] Calculating motif scores

Bartek Wilczynski bartek at rezolwenta.eu.org
Thu Jul 16 11:32:34 UTC 2009


On Thu, Jul 16, 2009 at 10:50 AM, Michiel de Hoon<mjldehoon at yahoo.com> wrote:
>
> Hi everybody,
Hi

>
> I was looking for a way to calculate the position-weight matrix score for a given sequence. Motif.score_hit(sequence,position,normalized=0,masked=0) in Bio/Motif/_Motif.py does what I need, but it calculates the score at only one position. For speed reasons, I am looking for a function that can calculate the scores at all positions in a sequence. Something like
>
> score(pwm, sequence)
>
> returning a Numerical Python array of length len(sequence) - len(pwm) + 1, with the "score" function implemented in a C extension. Perhaps the position-weight matrix should be its own class, with "score" as one of its methods.
>
> Is there perhaps some other function that I can use for this?

The function you are looking for is called search_pwm:

search_pwm(self, sequence, normalized=0, masked=0, threshold=0.0, both=True)
a generator function, returning found hits in a given sequence with
the pwm score higher than the threshold

> If not, I can contribute a C extension implementing this functionality. If so, are there any preferences on how this should be integrated with Bio.Motif?


As you can see, the current function is a generator rather than
returning a full array, because of the memory issues with searching
large sequences
for a few cases of a good motif. If you set the threshold to (-inf)
you should get the results for all positions.

Nonetheless, if you have a function in c doing just that, we could
incorporate it into biopython, for fast exhaustive searches on shorter
seqences.

cheers

Bartek




More information about the Biopython-dev mailing list