[Bioperl-l] Bio::Matrix::Substitution alpha version
Allen Smith
easmith@beatrice.rutgers.edu
Fri, 27 Sep 2002 20:53:47 -0400
Hi. I have an alpha version of Bio::Matrix::Substitution and
Bio::Matrix::SubstitutionI ready for public inspection. It includes not only
these modules (which do have POD documentation) but a t/MatrixSubstitution.t
test file and a couple of data files for testing in the t/data/
subdirectory.
Three things:
A. What's the recommended means of submission of such? SSH is not
locally particularly available, thanks to IRIX not having a
/dev/random, incidentally. I can make it available for HTTP
access without problems, BTW.
B. This is _just_ those two modules. Incorporation into the rest of
Bioperl (including SimpleAlign and maybe Bio::Tools::OddCodes) is
a further project. Two major things will be needed for this:
1. To efficiently match "these are the AAs/whatever that are
closely related according to the matrix" to "these are
the AAs/whatever that we _have_", as in the
substitution/consensus groups in SimpleAlign and
Bio::Tools::OddCodes, the best method I've come up with
is conversion of the presence/absence of each AA/whatever
into a bit in a bitstring, using vec, followed by bit
operations. This is considerably faster than using
regexes or Set::Scalar. It would be by far for the best
if this were also made into a new module (or set of
modules). Anyone have a good name?
2. A proper means of associating matrices with alignments
_and with sequences_, and having this be easily
extensible to associate "this matrix is the best one to
use in these spots along this sequence, unless the other
sequence says to use something else" (as in, for
instance, being able to take a sequence for which the
structure is known and use different matrices between
alpha helical regions, beta sheet regions, etcetera, when
matching/aligning to a homologous sequence for which the
structure has not been determined). The existing
Bio::Range stuff doesn't appear to quite match up with
the requirements for this - one needs to be able to say
"this matrix should be used for positions
3-5,7-9,11-13..." (e.g., if there's a partially-buried
structure and one is using matrices that vary depending
on degree of solvent exposure) without getting into an
insane number of seperate objects. Thoughts?
C. I noted a slight further bug in SimpleAlign, and have put it into
the bug-tracker (with a patch to solve the problem). The bug is
that SimpleAlign's consensus_iupac routine, when expanding
previous IUPAC symbols to the corresponding possible set of NAs,
wasn't always doing so when necessary (it did the one-to-two set
of expansions when a regexp matched [SKYWM] when it should have
been matching [SKYRWM]).
-Allen
--
Allen Smith http://cesario.rutgers.edu/easmith/
September 11, 2001 A Day That Shall Live In Infamy II
"They that can give up essential liberty to obtain a little temporary
safety deserve neither liberty nor safety." - Benjamin Franklin