Tue Apr 22 19:51:11 EDT 2003

>
> 	It turns out my program even outperforms Phil Green's
> implementation. It is 10% faster and uses only one-tenth of memory.
> You can download my code from 
> http://www.stanford.edu/~yeeman/global.tgz.
> Here is the performance comparison:
>
> example I used:
> t1.fa and t2.fa at http://www.stanford.edu/~yeeman/
> Both are DNA sequences with about 9800bp
>
> compiler for both ssearch34 and my program:
> gcc -g
>
> command line:
> ./ssearch34 -n -3 -q -H -f 3 -g 1 -d 1 -r "3/-1" -b 1 t1.fa t2.fa
> ./global t1.fa t2.fa
>
> ssearch34:
> 13.5 sec to find score
> 1 min 9 sec to calculate local alignment
> If it takes two score calculation to find the end points of
> lcoal alignment, the global alignment time will be 42 sec.
> Uses 15MB of RAM
>
> My Miller-Myers implementation:
> 38 sec to calculate alignment
> Uses 1.3MB of RAM

This report is unfortunately misleading.  When the "ssearch34" program
compares two sequences and produces the optimal Smith-Waterman 
alignment, it does
the comparison twice, once to get the score and a second time to do the 
alignment.
The score calculation uses the Phil Green optimization, while the 
alignment
calculation uses Hirshberg/Miller/Myers.

Thus, the correct timing comparison would be 13.5 sec for ssearch (Phil 
Green
score) vs 38 sec for Miller/Myers.  This is exactly what is expected - 
a non-
Phil Green Smith-Waterman would be about 50% slower (~20 sec) and 
Miller/Myers/
Hirshberg does two of these full Smith-Waterman's to produce an 
alignment (~40
sec expected).  ssearch may take a bit longer than this because of the 
other
things it is doing - statistics, alignment summary, etc.

The memory requirements are very misleading as well, as ssearch is a 
large program,
with functions for calculating statistical significance, storing many 
different
scoring matrices, and producing many different output formats, and 
space for
storing the results from 100,000's of comparisons.  If one focusses
only on the memory requirements for the Phil Green Smith-Waterman vs 
Miller/Myers
part of ssearch, the memory requirements are identical, about 5 words 
for each
character in the query sequence. (Actually, as implemented, ssearch 
uses about
30 words per residue, because the scoring matrix is kept as a profile.) 
  Focussing
only on the Smith-Waterman memory requirements, to compare 2 10,000 
residue sequences
would require about 200 Kb at 5 words/residue, or 1,200 Kb at 30 
words/residue.

Bill Pearson