[Bioperl-l] Implemented Miller-Myers algorithm for local alignment

Yee Man ymc at paxil.stanford.edu
Mon Apr 28 18:28:11 EDT 2003


Hi, folks,

	By adding two passes to find the ends of the local alignment and
then apply the linear space global alignment, I extended my previously
implemented Miller-Myers (1988) algorithm to handle local alignment. It is
now incorporated into my Perl XS module that is ready to be included in
bioperl. You can download it from:

http://www.stanford.edu/~yeeman/dsw.tgz

	A quick test with this and the Phil Green implementation in the
Perl XS incarnations show that mine uses 28 sec and 5.8MB RAM whereas the
PG one uses 55 sec and 8.4MB to align the two 10kb sequences located at:

http://www.stanford.edu/~yeeman/t1.fa
http://www.stanford.edu/~yeeman/t2.fa

	The current implementation is for DNA only (or any scoring scheme
with only match and mismatch). I can make it work with protein (or any
scoring matrix) if people asks for it.

	Feel free to try it out and give me comments.

Thanks and have a great day!
Yee Man

On Mon, 21 Apr 2003, Yee Man wrote:

> 
> Hi, folks,
> 
> 	After another month of hacking around, I followed the Miller-Myers
> paper (1994) and implemented the Hirschberg divide and conquer algorithm. 
> This is a global alignment algorithm (ie Needleman-Wunsch) but it can be
> extended to become Smith-Waterman if I calculate the scores to find the
> ends.
> 
> 	It turns out my program even outperforms Phil Green's
> implementation. It is 10% faster and uses only one-tenth of memory.
> You can download my code from http://www.stanford.edu/~yeeman/global.tgz.
> Here is the performance comparison:
> 
> example I used:
> t1.fa and t2.fa at http://www.stanford.edu/~yeeman/
> Both are DNA sequences with about 9800bp
> 
> compiler for both ssearch34 and my program:
> gcc -g
> 
> command line:
> ./ssearch34 -n -3 -q -H -f 3 -g 1 -d 1 -r "3/-1" -b 1 t1.fa t2.fa
> ./global t1.fa t2.fa 
> 
> ssearch34: 
> 13.5 sec to find score
> 1 min 9 sec to calculate local alignment
> If it takes two score calculation to find the end points of
> lcoal alignment, the global alignment time will be 42 sec.
> Uses 15MB of RAM
> 
> My Miller-Myers implementation:
> 38 sec to calculate alignment 
> Uses 1.3MB of RAM
> 
> 	It seems to me my program is 10% faster than Phil Green's
> algorithm. But based on my understanding, I didn't even put any heuristics
> into it like Phil Green. So far, my program seems to work alright with my
> 10 or so examples. Can someone also try it out and see how it performs?
> 
> 	Ewan: If you like my program, I can modify it to handle protein as
> well.
> 
> Thanks
> Yee Man
> 
> 



More information about the Bioperl-l mailing list