[Bioperl-l] conservative amino acid change?

Aaron J Mackey Aaron J. Mackey" <amackey@virginia.edu
Tue, 7 Aug 2001 10:24:18 -0400 (EDT)

On Tue, 7 Aug 2001, Arlin Stoltzfus wrote:

> A matrix denoting pairwise similarity might take many forms, but PAM
> and BLOSUM are specifically log-odds matrices scaled so as to make
> them useful as match-scores in popular sequence alignment algorithms.
> Matrices with the same information but the wrong scale could not be
> used with these alignment programs, but they might be useful
> whenever one desires arbitrarily-scaled weights or penalties for
> similarity or difference.

Just to be clear, different scalings of a matrix will only minimally
affect the resolution of the alignment algorithm ... i.e. if PAM250 were
coded in 1/2 bit units, all the values would be nearly 0 due to rounding
error (it is true that (nearly) all alignment programs expect integer
scores).  On the other hand, a PAM10 scaled in 1/10th bit units will have
a much larger range of values but will perform nearly identically as a
PAM10 scaled in 1/2 bit units (since very little rounding error is
introduced at 1/2 bit scale for PAM10).

> The only thing one would need to know is whether the value is higher
> for similar amino acids (thus a similarity matrix) or lower (thus
> a difference or distance matrix).

I think we're going to build Matrix::Similarity objects to be the first
and not the second.  We could certainly build Matrix::Distance objects
that were funny clones of Matrix::Similarity that did the context switch
on the fly.

> But converting a given matrix
> into a form that is optimized for use as alignment match-scores
> is apparently something of a black art.

No it's not, see above.  Changing the scale doesn't affect the information
content of the matrix.  You would, of course, need to scale your gap
penalty function in a corresponding manner, but matrix scale is itself not
a difficult issue.  It's often convenient to have all of your matrices
scaled similarly (for instance, I've run experiments where I used PAM
matrices from 10 to 500, all in 1/20th bit units so that I could compare
raw scores between matrix usages).


 o ~   ~   ~   ~   ~   ~  o
/ Aaron J Mackey           \
\  Dr. Pearson Laboratory  /
 \ University of Virginia  \
 /  (434) 924-2821          \
 \  amackey@virginia.edu    /
  o ~   ~   ~   ~   ~   ~  o