[Bioperl-l] Re: Another update for my alignment module
Yee Man Chan
ymc at paxil.stanford.edu
Wed May 21 10:42:00 EDT 2003
On Wed, 21 May 2003, Aaron J Mackey wrote:
>
> Hi Yee, overall this looks fine (although it duplicates some of what's
> already in Bio::Ext::Align). Can you imagine integrating your stuff with
> what's already in Bio::Tools::pSW (or better, having your own
> Bio::Tools::dpAlign, that mimics the behavior of pSW)?
I think the other pSW functionality that my code doesn't have right now is
align_and_show. It seems to me I can just steal his write_pretty_seq_align
function to do the job.
>
> I'd love to see the first thing on your TODO list, combined with a
> Bio::Matrix::SimilarityScore matrix object (wasn't someone working on
> that?).
I think I can call it if I know more about how to use it.
>
> What I think would also be a spectacular item for your TODO list is to
> calculate, for a given match/mismatch or scoring matrix, the expected
> average score per residue, and to throw a warning of it's > 0 (and thus
> not likely to generate local alignments). This bites many new users
> playing with DP algorithms and scoring parameters.
>
Is this like a score threshold to trigger the alignment phase? I think I
can calculate average score per residue but I don't know the formula for
"expected" average score per residue (How does this compare to the
bit-score used by BLAST?). I can probably add this thing as an option
later.
>
> I think that the developers of vector and other parallel-processed
> DP algorithms would disagree with it being the fastest.
>
Ok. I can clarify by adding it is for single CPU...
> > Phil Green (?? yr) introduced
> > heuristics to skip the calculation of some cells.
>
> I would restate this as something like "Phil Green's SWAT implementation
> of the Smith-Waterman algorithm introduced a heuristic that does not
> consider paths through the matrix where the score would be less than the
> gap open penalty, yielding a 1.5-2X speedup on most comparisons".
>
Ok
>
> As I said before, this simply isn't true. The Miller-Myers divide and
> conquer still must use some scoring scheme to calculate the "pivot point"
> where the alignment path crosses the joining row. There is nothing
> stopping you from using the Phil Green SWAT optimization/heuristic to
> calculate those scores and paths. The reason Bill Pearson's ssearch
> doesn't use this optimization during the alignment phase is that speed is
> not critical during the alignment phase, only the search phase.
>
I wrote another email to discuss this.
>
> Since you've already done a very good job of giving credit where it's due,
> I'd say "Bill Pearson's popular DP alignment program SSEARCH uses ... "
>
Ok
>
> You've already said that the Phil Green optimized SW can't be used to
> generate an alignment (which is untrue for the reason's I've already
> given) - what does DPALIGN_LOCAL_GREEN do then?
It means the search phase is using Phil Green's code. If his code can also
do alignment, it will be a complete Phil Green local alignment algorithm.
>
> > 5) For DNA sequences, provides an option to run reverse
> > complement search.
>
> Let the user do this by running pairwise_alignment($seq1, $seq2->revcomp);
>
Ok
> > 6) Support six frames alignment between a DNA sequence and
> > a protein sequence.
>
> Again, it's almost better to let the user do this themselves in bioperl
> space (keep your code and interface simpler):
>
> for $frame (0 .. 2) {
> push @alns,
> $factory->pairwise_alignment($dna->translate(undef, undef, $frame), $prot);
> }
>
> $dna = $dna->revcomp;
> # and repeat above for loop.
>
Ok
> As a side note, you need to make sure your similarity matrix has scores
> for "B", "Z", "X" and "*", if it doesn't already.
>
I looks at the blosum62.mat file in FASTA. It doesn't have "*". Is the
stop codon really an aa?
Regards,
Yee Man
> -Aaron
>
More information about the Bioperl-l
mailing list