[Bioperl-l] An update for my DNA Smith-Waterman code

Aaron J Mackey ajm6q at virginia.edu
Sat Jan 25 23:25:09 EST 2003


Yee, you might also consider comparing your code and performance to that
of SSEARCH in the FASTA package (found in dropgsw.c) - this is a
linear-space SW, includes affine gap penalties, and has the Phil Green
SWAT optimization; and if you have a way of making that faster, the world
will beat a path to your door.  If you're developing under OS X on a Mac
G4, I can also send you the patches to make it altivec-enabled (and
approx. 6-8 times faster).

The basic SSEARCH code is also that used by the EMBOSS tools; it's
well-tested code against which you should be able to compare your results
to ensure alignment accuracy.

-Aaron

On Sat, 25 Jan 2003, Yee Man wrote:

>
> Hi, folks,
>
> 	I updated my DNA Smith-Waterman to include the Gotoh way as of his
> 1981 paper. The algorithm I previously implemented is a modified Gotoh
> algorithm that maximizes the speed regardless of the memory usage. Here is
> a performance comparison between the two when I tried to align two
> very similar sequences with 10k bp each in my SPARC:
>
> 1) True Gotoh
> max memory: 35MB
> time: 1 min 54 sec
>
> 2) Modified Gotoh
> max memory: 395MB
> time: 1 min 3 sec
>
> 	As expected, the true Gotoh algorithm is two times slower because
> it needs another run to mark the gap extensions in the other
> sequence. The code can be downloaded from
> http://www.stanford.edu/~yeeman/dsw.tar.gz
>
> Future work:
> 1) Myers et al's Linear Space implementation (Should take even less space
> than True Gotoh but it can take twice as much time because we need 2
> passes to find the alignment end points [not sure about this actually
> but I don't know how to do it in one pass] and then another 2 passes to do
> the linear space global alignment.)
> 2) An implementation for zero gap opening penalty (Can be implemented as
> fast as Modified Gotoh but memory usage is equivalent to True Gotoh)
> 3) Support for global alignment and ends free alignment (ie no gap penalty
> for gaps at the ends of sequences, good for aligning overlapping clones.)
>
>        For those of you who are interested in this field or find
> application of DNA Smith-Waterman, please try out my code and give me
> comments.
>
> Best Regards,
> Yee Man
>
> On Sun, 12 Jan 2003, Ewan Birney wrote:
>
> >
> >
> > On Sat, 11 Jan 2003, Yee Man wrote:
> >
> > >
> > > Hi folks,
> > >
> > > 	I wrote a module that does DNA Smith-Waterman with affine gap
> > > penalty in the form of gap_cost = gap_penalty + extension_penalty * #gaps.
> > > I am wondering if someone here would allow me to add the code to the main
> > > tree.
> > >
> > > 	It is currently implemented with half-hearted Gotoh's improvement.
> > > Instead of storing the scoring matrix as an auxillary array and calculates
> > > it twice, I store the whole thing so that I only need to calculate it
> > > once. I also have a module that implements the whole Gotoh's improvement.
> > > The current one is good with small alignments. I can add the true Gotoh's
> > > improvement module as a fallback when we don't have the memory to run the
> > > doubly faster version.
> > >
> > > 	The code works basically the same as pSW and it returns a
> > > SimpleAlign object at the end. You can check align.pl in the
> > > attached package for details. It shouldn't take me too long to incorporate
> > > the code to the bioperl tree.
> >
> > I will take a look at it and integrate it in.
> >
> > >
> > > Regards,
> > > Yee Man
> > >
> > > PS Ewan: I read the comments in pSW.pm and it says it is going to use
> > > linear space to store the directional matrix. Is that really true? I
> > > remember when I read Ron Shamir's lecture notes, Hirschberg's divide and
> > > conquer only works for Global Alignment and not Local Alignment that SW is
> > > supposed to solve. Did I miss something?
> > >
> >
> > The first pass of the method converts the local alignment to a global
> > alignment by finding the start/end points. then you drop into divide and
> > conquor. my divide and conquor ...ummm... sucks ... because I use a high
> > overhead memory approach (which i call a shadow-matrix). Not worth
> > copying; far easier to write than the forward/backward meeting for complex
> > DPs
> >
> >
> > > PPS What is the name of Phil Green's paper to optimize SW?
> > >
> >
> > SWAT is the program.
> >
> >
> >
> >
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>

-- 
 Aaron J Mackey
 Pearson Laboratory
 University of Virginia
 (434) 924-2821
 amackey at virginia.edu




More information about the Bioperl-l mailing list