[Bioperl-l] conservative amino acid change?
Aaron J Mackey
Aaron J. Mackey" <amackey@virginia.edu
Mon, 6 Aug 2001 11:48:32 -0400 (EDT)
On Mon, 6 Aug 2001, Heikki Lehvaslaiho wrote:
> Sound very good. Let's do it. Send them over.
Before we get too far with this, there's one serious consideration:
calculating substitution matrices from instantaneous rate matrices
involves some heavy matrix math (namely, picking the instaneous rate
matric apart into it's respective eigenvectors and eigenvalues). Of
course all things are possible in Perl, but previously I've relied on PDL
(Perl Data Language) functions to do this stuff for me. Yet another
module dependency (and quite a big one, to boot).
Alternatively, if speed was a concern, I'd be happy to write an Inline::C
version of it; but again, another module dependency (but a much easier one
to overcome, and becoming a fairly standard "must have" module as more
and more stuff gets written in C).
So perhaps you'd like to start with a simpler version of matrix evolution:
store the PAM1, PAM10, and PAM100 substitution matrices, and then use
simple matrix multiplication to get where you need to go (PAM120 = PAM100
* PAM10 * PAM10, PAM232 = PAM100 * PAM100 * PAM10 * PAM10 * PAM10 * PAM1 *
PAM1, etc).
For BLOSUM's you'll just be storing all the various predefined values.
I'd suggest a SimilarityMatrix::PAM/BLOSUM naming scheme, all inheriting
from SimilarityMatrixI of course.
Later we can think about InstaneousRateMatrix::PAM modules that do fun
stuff including generate SimilarityMatrix::PAM objects at desired
distances (i.e. do the matrix-ey eigen-stuff above), but until BioPerl
moves into the molecular evolution arena I'm not too concerned.
-Aaron