[Bioperl-l] conservative amino acid change?
Aaron J Mackey
Aaron J. Mackey" <email@example.com
Tue, 7 Aug 2001 10:24:18 -0400 (EDT)
On Tue, 7 Aug 2001, Arlin Stoltzfus wrote:
> A matrix denoting pairwise similarity might take many forms, but PAM
> and BLOSUM are specifically log-odds matrices scaled so as to make
> them useful as match-scores in popular sequence alignment algorithms.
> Matrices with the same information but the wrong scale could not be
> used with these alignment programs, but they might be useful
> whenever one desires arbitrarily-scaled weights or penalties for
> similarity or difference.
Just to be clear, different scalings of a matrix will only minimally
affect the resolution of the alignment algorithm ... i.e. if PAM250 were
coded in 1/2 bit units, all the values would be nearly 0 due to rounding
error (it is true that (nearly) all alignment programs expect integer
scores). On the other hand, a PAM10 scaled in 1/10th bit units will have
a much larger range of values but will perform nearly identically as a
PAM10 scaled in 1/2 bit units (since very little rounding error is
introduced at 1/2 bit scale for PAM10).
> The only thing one would need to know is whether the value is higher
> for similar amino acids (thus a similarity matrix) or lower (thus
> a difference or distance matrix).
I think we're going to build Matrix::Similarity objects to be the first
and not the second. We could certainly build Matrix::Distance objects
that were funny clones of Matrix::Similarity that did the context switch
on the fly.
> But converting a given matrix
> into a form that is optimized for use as alignment match-scores
> is apparently something of a black art.
No it's not, see above. Changing the scale doesn't affect the information
content of the matrix. You would, of course, need to scale your gap
penalty function in a corresponding manner, but matrix scale is itself not
a difficult issue. It's often convenient to have all of your matrices
scaled similarly (for instance, I've run experiments where I used PAM
matrices from 10 to 500, all in 1/20th bit units so that I could compare
raw scores between matrix usages).
o ~ ~ ~ ~ ~ ~ o
/ Aaron J Mackey \
\ Dr. Pearson Laboratory /
\ University of Virginia \
/ (434) 924-2821 \
\ firstname.lastname@example.org /
o ~ ~ ~ ~ ~ ~ o