[Bioperl-l] Implemented Miller-Myers alignment algorithm It
turns out my program even outperforms Phil Green's implementation. It is 10%
faster and uses only one-tenth of memory. You can download my code from
http://www.stanford.edu/~yeeman/global.tgz. Here is the performance
comparison: example I used: t1.fa and t2.fa at
http://www.stanford.edu/~yeeman/ Both are DNA sequences with about 9800bp
compiler for both ssearch34 and my program: gcc -g command line: ./ssearch34
-n -3 -q -H -f 3 -g 1 -d 1 -r "3/-1" -b 1 t1.fa t2.fa ./global t1.fa t2.fa
ssearch34: 13.5 sec to find score 1 min 9 sec to calculate local alignment If
it takes two score calculation to find the end points of lcoal alignment, the
global alignment time will be 42 sec. Uses 15MB of RAM My Miller-Myers
implementation: 38 sec to calculate alignment Uses 1.3MB of RAM
William R.Pearson
wrp at alpha0.bioch.virginia.edu
Tue Apr 22 19:51:11 EDT 2003
>
> It turns out my program even outperforms Phil Green's
> implementation. It is 10% faster and uses only one-tenth of memory.
> You can download my code from
> http://www.stanford.edu/~yeeman/global.tgz.
> Here is the performance comparison:
>
> example I used:
> t1.fa and t2.fa at http://www.stanford.edu/~yeeman/
> Both are DNA sequences with about 9800bp
>
> compiler for both ssearch34 and my program:
> gcc -g
>
> command line:
> ./ssearch34 -n -3 -q -H -f 3 -g 1 -d 1 -r "3/-1" -b 1 t1.fa t2.fa
> ./global t1.fa t2.fa
>
> ssearch34:
> 13.5 sec to find score
> 1 min 9 sec to calculate local alignment
> If it takes two score calculation to find the end points of
> lcoal alignment, the global alignment time will be 42 sec.
> Uses 15MB of RAM
>
> My Miller-Myers implementation:
> 38 sec to calculate alignment
> Uses 1.3MB of RAM
This report is unfortunately misleading. When the "ssearch34" program
compares two sequences and produces the optimal Smith-Waterman
alignment, it does
the comparison twice, once to get the score and a second time to do the
alignment.
The score calculation uses the Phil Green optimization, while the
alignment
calculation uses Hirshberg/Miller/Myers.
Thus, the correct timing comparison would be 13.5 sec for ssearch (Phil
Green
score) vs 38 sec for Miller/Myers. This is exactly what is expected -
a non-
Phil Green Smith-Waterman would be about 50% slower (~20 sec) and
Miller/Myers/
Hirshberg does two of these full Smith-Waterman's to produce an
alignment (~40
sec expected). ssearch may take a bit longer than this because of the
other
things it is doing - statistics, alignment summary, etc.
The memory requirements are very misleading as well, as ssearch is a
large program,
with functions for calculating statistical significance, storing many
different
scoring matrices, and producing many different output formats, and
space for
storing the results from 100,000's of comparisons. If one focusses
only on the memory requirements for the Phil Green Smith-Waterman vs
Miller/Myers
part of ssearch, the memory requirements are identical, about 5 words
for each
character in the query sequence. (Actually, as implemented, ssearch
uses about
30 words per residue, because the scoring matrix is kept as a profile.)
Focussing
only on the Smith-Waterman memory requirements, to compare 2 10,000
residue sequences
would require about 200 Kb at 5 words/residue, or 1,200 Kb at 30
words/residue.
Bill Pearson
More information about the Bioperl-l
mailing list