[Bioperl-l] Re: Another update for my alignment module
Yee Man Chan
ymc at paxil.stanford.edu
Wed May 28 13:23:55 EDT 2003
Hi Aaron and Ewan,
I implemented the function align_and_show by calling Ewan's
write_pretty_seq_align. I believe now functionality-wise, my code is on
par of Ewan's code. I think in the next step, I will need to merge my
Align.xs and Ewan's Align.xs. How can I do that? My updated code is at
http://www.stanford.edu/~yeeman/dpalign.tgz
Thanks
Yee Man
On Wed, 21 May 2003, Aaron J Mackey wrote:
>
> Hi Yee, overall this looks fine (although it duplicates some of what's
> already in Bio::Ext::Align). Can you imagine integrating your stuff with
> what's already in Bio::Tools::pSW (or better, having your own
> Bio::Tools::dpAlign, that mimics the behavior of pSW)? I see your
> $factory->pairwise_alignment($seq1, $seq2) already does look like pSW, so
> that's great.
>
> I'd love to see the first thing on your TODO list, combined with a
> Bio::Matrix::SimilarityScore matrix object (wasn't someone working on
> that?).
>
> What I think would also be a spectacular item for your TODO list is to
> calculate, for a given match/mismatch or scoring matrix, the expected
> average score per residue, and to throw a warning of it's > 0 (and thus
> not likely to generate local alignments). This bites many new users
> playing with DP algorithms and scoring parameters.
>
> > It is accepted
> > that Miller-Myers implementation is the fastest and using the
> > least memory that is truly equivalent to original algorithm
> > introduced by Needleman-Wunsch.
>
> I think that the developers of vector and other parallel-processed
> DP algorithms would disagree with it being the fastest.
>
> > Phil Green (?? yr) introduced
> > heuristics to skip the calculation of some cells.
>
> I would restate this as something like "Phil Green's SWAT implementation
> of the Smith-Waterman algorithm introduced a heuristic that does not
> consider paths through the matrix where the score would be less than the
> gap open penalty, yielding a 1.5-2X speedup on most comparisons".
>
>
> > However,
> > his approach is only good for calculating the minimum edit
> > distance and find out the corresponding subsequences (aka
> > search phase).
>
> As I said before, this simply isn't true. The Miller-Myers divide and
> conquer still must use some scoring scheme to calculate the "pivot point"
> where the alignment path crosses the joining row. There is nothing
> stopping you from using the Phil Green SWAT optimization/heuristic to
> calculate those scores and paths. The reason Bill Pearson's ssearch
> doesn't use this optimization during the alignment phase is that speed is
> not critical during the alignment phase, only the search phase.
>
> > The most popular dynamic programming alignment
> > program ssearch uses Phil Green's algorithm to find the
> > subsequences and then Miller-Myers's algorithm to find the
> > actual alignment. (aka alignment phase)
>
> Since you've already done a very good job of giving credit where it's due,
> I'd say "Bill Pearson's popular DP alignment program SSEARCH uses ... "
>
> > It allows you
> > to specify either the Phil Green (DPALIGN_LOCAL_GREEN)
> > or Miller-Myers (DPALIGN_LOCAL_MILLER_MYERS).
>
> You've already said that the Phil Green optimized SW can't be used to
> generate an alignment (which is untrue for the reason's I've already
> given) - what does DPALIGN_LOCAL_GREEN do then?
>
> > 5) For DNA sequences, provides an option to run reverse
> > complement search.
>
> Let the user do this by running pairwise_alignment($seq1, $seq2->revcomp);
>
> > 6) Support six frames alignment between a DNA sequence and
> > a protein sequence.
>
> Again, it's almost better to let the user do this themselves in bioperl
> space (keep your code and interface simpler):
>
> for $frame (0 .. 2) {
> push @alns,
> $factory->pairwise_alignment($dna->translate(undef, undef, $frame), $prot);
> }
>
> $dna = $dna->revcomp;
> # and repeat above for loop.
>
> As a side note, you need to make sure your similarity matrix has scores
> for "B", "Z", "X" and "*", if it doesn't already.
>
> -Aaron
>
More information about the Bioperl-l
mailing list