[Bioperl-l] Re: Another update for my alignment module

Yee Man Chan ymc at paxil.stanford.edu
Wed May 21 10:42:00 EDT 2003



On Wed, 21 May 2003, Aaron J Mackey wrote:

> 
> Hi Yee, overall this looks fine (although it duplicates some of what's
> already in Bio::Ext::Align).  Can you imagine integrating your stuff with
> what's already in Bio::Tools::pSW (or better, having your own
> Bio::Tools::dpAlign, that mimics the behavior of pSW)?  

I think the other pSW functionality that my code doesn't have right now is 
align_and_show. It seems to me I can just steal his write_pretty_seq_align
function to do the job.

> 
> I'd love to see the first thing on your TODO list, combined with a
> Bio::Matrix::SimilarityScore matrix object (wasn't someone working on
> that?).

I think I can call it if I know more about how to use it.

> 
> What I think would also be a spectacular item for your TODO list is to
> calculate, for a given match/mismatch or scoring matrix, the expected
> average score per residue, and to throw a warning of it's > 0 (and thus
> not likely to generate local alignments).  This bites many new users
> playing with DP algorithms and scoring parameters.
> 

Is this like a score threshold to trigger the alignment phase? I think I
can calculate average score per residue but I don't know the formula for
"expected" average score per residue (How does this compare to the
bit-score used by BLAST?). I can probably add this thing as an option
later.

> 
> I think that the developers of vector and other parallel-processed
> DP algorithms would disagree with it being the fastest.
> 

Ok. I can clarify by adding it is for single CPU...

> >       Phil Green (?? yr) introduced
> > 	heuristics to skip the calculation of some cells.
> 
> I would restate this as something like "Phil Green's SWAT implementation
> of the Smith-Waterman algorithm introduced a heuristic that does not
> consider paths through the matrix where the score would be less than the
> gap open penalty, yielding a 1.5-2X speedup on most comparisons".
> 

Ok

> 
> As I said before, this simply isn't true.  The Miller-Myers divide and
> conquer still must use some scoring scheme to calculate the "pivot point"
> where the alignment path crosses the joining row.  There is nothing
> stopping you from using the Phil Green SWAT optimization/heuristic to
> calculate those scores and paths.  The reason Bill Pearson's ssearch
> doesn't use this optimization during the alignment phase is that speed is
> not critical during the alignment phase, only the search phase.
> 

I wrote another email to discuss this.

> 
> Since you've already done a very good job of giving credit where it's due,
> I'd say "Bill Pearson's popular DP alignment program SSEARCH uses ... "
> 

Ok

> 
> You've already said that the Phil Green optimized SW can't be used to
> generate an alignment (which is untrue for the reason's I've already
> given) - what does DPALIGN_LOCAL_GREEN do then?

It means the search phase is using Phil Green's code. If his code can also
do alignment, it will be a complete Phil Green local alignment algorithm.

> 
> > 	5) For DNA sequences, provides an option to run reverse
> > 	complement search.
> 
> Let the user do this by running pairwise_alignment($seq1, $seq2->revcomp);
> 

Ok

> > 	6) Support six frames alignment between a DNA sequence and
> > 	a protein sequence.
> 
> Again, it's almost better to let the user do this themselves in bioperl
> space (keep your code and interface simpler):
> 
> for $frame (0 .. 2) {
>   push @alns,
>     $factory->pairwise_alignment($dna->translate(undef, undef, $frame), $prot);
> }
> 
> $dna = $dna->revcomp;
> # and repeat above for loop.
> 

Ok

> As a side note, you need to make sure your similarity matrix has scores
> for "B", "Z", "X" and "*", if it doesn't already.
> 

I looks at the blosum62.mat file in FASTA. It doesn't have "*". Is the
stop codon really an aa?

Regards,
Yee Man

> -Aaron
> 



More information about the Bioperl-l mailing list