[Bioperl-l] An update for my DNA Smith-Waterman code

Sat Jan 25 23:21:43 EST 2003

On Sat, 25 Jan 2003, Aaron J Mackey wrote:

> 
> Yee, you might also consider comparing your code and performance to that
> of SSEARCH in the FASTA package (found in dropgsw.c) - this is a
> linear-space SW, includes affine gap penalties, and has the Phil Green
> SWAT optimization; and if you have a way of making that faster, the world
> will beat a path to your door.  

Hi Aaron

I am aware of this implementation and I have the code. However, I can't
seem to compile the package on my SPARC. Also I would like to know if
there are any paper that descries his algorithm before I dive into the
code. Do you know such a paper? Do you have a short C program that reads
two FASTAs and calls the functions in dropgsw.c? 

By the way, why didn't Ewan's pSW.pm use SWAT already if SWAT is so good?
Is it because of licensing issues?

> If you're developing under OS X on a Mac
> G4, I can also send you the patches to make it altivec-enabled (and
> approx. 6-8 times faster).
> 

Is this something that multi-threads the program to run on multiple CPUs?

> The basic SSEARCH code is also that used by the EMBOSS tools; it's
> well-tested code against which you should be able to compare your results
> to ensure alignment accuracy.

I tried it but it seems to me ssearch34 only display the score only. How
can I get the alignment? How can I set gap penalty? I read the
accompanied fasta3x.doc but I still can't figure it out...

Also, when I run it with the 10k sequences I used, it gave me this error:
!! No library sequences with E() < 2

Why? You can download my test sequences (t1.fa and t2.fa) at
http://www.stanford.edu/~yeeman/

Thanks a lot.
Yee Man

> 
> -Aaron
> 
> On Sat, 25 Jan 2003, Yee Man wrote:
> 
> >
> > Hi, folks,
> >
> > 	I updated my DNA Smith-Waterman to include the Gotoh way as of his
> > 1981 paper. The algorithm I previously implemented is a modified Gotoh
> > algorithm that maximizes the speed regardless of the memory usage. Here is
> > a performance comparison between the two when I tried to align two
> > very similar sequences with 10k bp each in my SPARC:
> >
> > 1) True Gotoh
> > max memory: 35MB
> > time: 1 min 54 sec
> >
> > 2) Modified Gotoh
> > max memory: 395MB
> > time: 1 min 3 sec
> >
> > 	As expected, the true Gotoh algorithm is two times slower because
> > it needs another run to mark the gap extensions in the other
> > sequence. The code can be downloaded from
> > http://www.stanford.edu/~yeeman/dsw.tar.gz
> >
> > Future work:
> > 1) Myers et al's Linear Space implementation (Should take even less space
> > than True Gotoh but it can take twice as much time because we need 2
> > passes to find the alignment end points [not sure about this actually
> > but I don't know how to do it in one pass] and then another 2 passes to do
> > the linear space global alignment.)
> > 2) An implementation for zero gap opening penalty (Can be implemented as
> > fast as Modified Gotoh but memory usage is equivalent to True Gotoh)
> > 3) Support for global alignment and ends free alignment (ie no gap penalty
> > for gaps at the ends of sequences, good for aligning overlapping clones.)
> >
> >        For those of you who are interested in this field or find
> > application of DNA Smith-Waterman, please try out my code and give me
> > comments.
> >
> > Best Regards,
> > Yee Man
> >
> > On Sun, 12 Jan 2003, Ewan Birney wrote:
> >
> > >
> > >
> > > On Sat, 11 Jan 2003, Yee Man wrote:
> > >
> > > >
> > > > Hi folks,
> > > >
> > > > 	I wrote a module that does DNA Smith-Waterman with affine gap
> > > > penalty in the form of gap_cost = gap_penalty + extension_penalty * #gaps.
> > > > I am wondering if someone here would allow me to add the code to the main
> > > > tree.
> > > >
> > > > 	It is currently implemented with half-hearted Gotoh's improvement.
> > > > Instead of storing the scoring matrix as an auxillary array and calculates
> > > > it twice, I store the whole thing so that I only need to calculate it
> > > > once. I also have a module that implements the whole Gotoh's improvement.
> > > > The current one is good with small alignments. I can add the true Gotoh's
> > > > improvement module as a fallback when we don't have the memory to run the
> > > > doubly faster version.
> > > >
> > > > 	The code works basically the same as pSW and it returns a
> > > > SimpleAlign object at the end. You can check align.pl in the
> > > > attached package for details. It shouldn't take me too long to incorporate
> > > > the code to the bioperl tree.
> > >
> > > I will take a look at it and integrate it in.
> > >
> > > >
> > > > Regards,
> > > > Yee Man
> > > >
> > > > PS Ewan: I read the comments in pSW.pm and it says it is going to use
> > > > linear space to store the directional matrix. Is that really true? I
> > > > remember when I read Ron Shamir's lecture notes, Hirschberg's divide and
> > > > conquer only works for Global Alignment and not Local Alignment that SW is
> > > > supposed to solve. Did I miss something?
> > > >
> > >
> > > The first pass of the method converts the local alignment to a global
> > > alignment by finding the start/end points. then you drop into divide and
> > > conquor. my divide and conquor ...ummm... sucks ... because I use a high
> > > overhead memory approach (which i call a shadow-matrix). Not worth
> > > copying; far easier to write than the forward/backward meeting for complex
> > > DPs
> > >
> > >
> > > > PPS What is the name of Phil Green's paper to optimize SW?
> > > >
> > >
> > > SWAT is the program.
> > >
> > >
> > >
> > >
> >
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l at bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> >
> 
> -- 
>  Aaron J Mackey
>  Pearson Laboratory
>  University of Virginia
>  (434) 924-2821
>  amackey at virginia.edu
> 
>