[Bioperl-l] conservative amino acid change?

Arlin Stoltzfus arlin@carb.nist.gov
Tue, 07 Aug 2001 09:37:56 -0400

Heikki Lehvaslaiho wrote:
> I do not expect there to be dozens of objects here so why not:

There are dozens of different pairwise matrices for amino acids, 
and (literally) hundreds of different indices (i.e., non-pairwise 
measures) of physicochemical properties.  See the AAIndex database:


> Bio::Matrix::Similarity
> Bio::Matrix::PAM
> Bio::Matrix::BLOSUM
> How different can the matrixes be? 

A matrix denoting pairwise similarity might take many forms, but PAM 
and BLOSUM are specifically log-odds matrices scaled so as to make 
them useful as match-scores in popular sequence alignment algorithms.  
Matrices with the same information but the wrong scale could not be 
used with these alignment programs, but they might be useful 
whenever one desires arbitrarily-scaled weights or penalties for 
similarity or difference.     

> Do we need a class for every type
> of similarity matrix or is it enough to have one class
> (Bio::Matrix::Similarity) with an attibute/method format(PAM|BLOSUM)
> to tell how the values were generated? Go for separete classes only of
> there will be methods in one which are not relevant in the other.

As Aaron Mackey suggested, some matrices are instantaneous rate 
matrices.  One might wish to have different methods for these than 
for the log-odds scoring matrices. 

But some methods could be general-- any pairwise amino acid matrix 
might be used as a matrix of arbitrarily-scaled similarity or difference.  
The only thing one would need to know is whether the value is higher 
for similar amino acids (thus a similarity matrix) or lower (thus 
a difference or distance matrix). 

I'm afraid that for anything more complicated, you might have to 
make the matrices carry their own methods, e.g., a matrix of 
differences would include a method (e.g., S_ij = 1 - D_ij) 
to compute a similarity matrix.  But converting a given matrix 
into a form that is optimized for use as alignment match-scores 
is apparently something of a black art.  What do you foresee as 
the most common applications of pairwise matrices of similarity,  
difference, rates, weights, and so on?