[Bioperl-l] Re: Miller-Myers Algorithm

Yee Man ymc at paxil.stanford.edu
Fri May 2 17:01:15 EDT 2003


Hi, Dr Pearson and Aaron,

>This report is unfortunately misleading.  When the "ssearch34" program
>compares two sequences and produces the optimal Smith-Waterman 
>alignment, it does the comparison twice, once to get the score and a
>second time to do the alignment.
>The score calculation uses the Phil Green optimization, while the 
>alignment calculation uses Hirshberg/Miller/Myers.

I have implemented another version of Miller-Myers algorithm that does
almost two passes (find one pair of end points and then go back to find
the starting points. Is there a better way?) to find the end points and
then do Miller-Myers global alignment to align the subsequences bounded by
the end points. This should be equivalent to ssearch in terms of purpose

I believe my statement was misleading as far as space consumption is
concerned because I didn't state that ssearch uses memory space to do
other things. I am sorry if that causes any inconvenience. However, I
still stand by the statements related to speed because I did mention it
took four passes to calculate a local alignment and I stated clearly that
your time should be halved before any comparison to my program.

>Thus, the correct timing comparison would be 13.5 sec for ssearch (Phil 
>Green score) vs 38 sec for Miller/Myers.  This is exactly what is
>expected - a non-Phil Green Smith-Waterman would be about 50% slower
> (~20 sec) and Miller/Myers/Hirshberg does two of these full
> Smith-Waterman's to produce an alignment (~40 sec expected).  ssearch
> may take a bit longer than this because of the other things it is doing
> - statistics, alignment summary, etc.

>Bill Pearson

You can download my code at 
http://www.stanford.edu/~yeeman/dsw.tgz

This is a perl XS implementation. It has both my code (in linspc.c) and
Phil Green's code (inside pgreen directory) I extracted from ssearch
source code. The memory usage now becomes 6.5MB vs 8.5MB and 28 sec vs 56
sec in my machine. I removed lots of "junk" in that Phil Green code but it
still underperform my code. In fact, Aaron checked my Phil Green library
before. He thinks my library is correct but he said it was slower than he
thought. He told me he will look into that but I never heard from him
about this since January.

I believe Phil Green's code should be theoretically faster (although I
couldn't find his paper/documentation). There may be something wrong in my
Phil Green library I don't notice. It would be great if Aaron can point
that out.

Thanks a lot.
Yee Man




More information about the Bioperl-l mailing list