[Bioperl-l] An update for my DNA Smith-Waterman code

Yee Man ymc at paxil.stanford.edu
Sat Jan 25 18:00:09 EST 2003


Hi, folks,

	I updated my DNA Smith-Waterman to include the Gotoh way as of his
1981 paper. The algorithm I previously implemented is a modified Gotoh
algorithm that maximizes the speed regardless of the memory usage. Here is
a performance comparison between the two when I tried to align two
very similar sequences with 10k bp each in my SPARC:

1) True Gotoh
max memory: 35MB
time: 1 min 54 sec

2) Modified Gotoh
max memory: 395MB
time: 1 min 3 sec

	As expected, the true Gotoh algorithm is two times slower because
it needs another run to mark the gap extensions in the other
sequence. The code can be downloaded from 
http://www.stanford.edu/~yeeman/dsw.tar.gz

Future work:
1) Myers et al's Linear Space implementation (Should take even less space
than True Gotoh but it can take twice as much time because we need 2
passes to find the alignment end points [not sure about this actually
but I don't know how to do it in one pass] and then another 2 passes to do
the linear space global alignment.)
2) An implementation for zero gap opening penalty (Can be implemented as
fast as Modified Gotoh but memory usage is equivalent to True Gotoh)
3) Support for global alignment and ends free alignment (ie no gap penalty
for gaps at the ends of sequences, good for aligning overlapping clones.)

       For those of you who are interested in this field or find
application of DNA Smith-Waterman, please try out my code and give me 
comments.

Best Regards,
Yee Man

On Sun, 12 Jan 2003, Ewan Birney wrote:

> 
> 
> On Sat, 11 Jan 2003, Yee Man wrote:
> 
> >
> > Hi folks,
> >
> > 	I wrote a module that does DNA Smith-Waterman with affine gap
> > penalty in the form of gap_cost = gap_penalty + extension_penalty * #gaps.
> > I am wondering if someone here would allow me to add the code to the main
> > tree.
> >
> > 	It is currently implemented with half-hearted Gotoh's improvement.
> > Instead of storing the scoring matrix as an auxillary array and calculates
> > it twice, I store the whole thing so that I only need to calculate it
> > once. I also have a module that implements the whole Gotoh's improvement.
> > The current one is good with small alignments. I can add the true Gotoh's
> > improvement module as a fallback when we don't have the memory to run the
> > doubly faster version.
> >
> > 	The code works basically the same as pSW and it returns a
> > SimpleAlign object at the end. You can check align.pl in the
> > attached package for details. It shouldn't take me too long to incorporate
> > the code to the bioperl tree.
> 
> I will take a look at it and integrate it in.
> 
> >
> > Regards,
> > Yee Man
> >
> > PS Ewan: I read the comments in pSW.pm and it says it is going to use
> > linear space to store the directional matrix. Is that really true? I
> > remember when I read Ron Shamir's lecture notes, Hirschberg's divide and
> > conquer only works for Global Alignment and not Local Alignment that SW is
> > supposed to solve. Did I miss something?
> >
> 
> The first pass of the method converts the local alignment to a global
> alignment by finding the start/end points. then you drop into divide and
> conquor. my divide and conquor ...ummm... sucks ... because I use a high
> overhead memory approach (which i call a shadow-matrix). Not worth
> copying; far easier to write than the forward/backward meeting for complex
> DPs
> 
> 
> > PPS What is the name of Phil Green's paper to optimize SW?
> >
> 
> SWAT is the program.
> 
> 
> 
> 



More information about the Bioperl-l mailing list