[Bioperl-l] Bio::Matrix::Substitution alpha version

Jason Stajich jason@cgt.mc.duke.edu
Sun, 29 Sep 2002 15:20:52 -0400 (EDT)


On Fri, 27 Sep 2002, Allen Smith wrote:

> Hi. I have an alpha version of Bio::Matrix::Substitution and
> Bio::Matrix::SubstitutionI ready for public inspection. It includes not only
> these modules (which do have POD documentation) but a t/MatrixSubstitution.t
> test file and a couple of data files for testing in the t/data/
> subdirectory.
>
> Three things:
> 	A. What's the recommended means of submission of such? SSH is not
> 	   locally particularly available, thanks to IRIX not having a
> 	   /dev/random, incidentally. I can make it available for HTTP
> 	   access without problems, BTW.

I'd rather that you own this code and commit it directly to the CVS
repository once we've reviewed it.  The submitted patches system requires
too much bandwidth for core developers so we'd rather people put code in
and allow others to make suggestions and improvements.  This is done with
CVS over SSH so you'll ned to decide if your concerns over non-secure SSH
from IRIX this is a big enough deal to prevent you from contributing.  We
can contact you off list about getting an account for this. If you want to
make it available via HTTP in the meantime we can look it over and commit
it.

> 	B. This is _just_ those two modules. Incorporation into the rest of
> 	   Bioperl (including SimpleAlign and maybe Bio::Tools::OddCodes) is
> 	   a further project. Two major things will be needed for this:

> 		1. To efficiently match "these are the AAs/whatever that are
> 		   closely related according to the matrix" to "these are
> 		   the AAs/whatever that we _have_", as in the
> 		   substitution/consensus groups in SimpleAlign and
> 		   Bio::Tools::OddCodes, the best method I've come up with
> 		   is conversion of the presence/absence of each AA/whatever
> 		   into a bit in a bitstring, using vec, followed by bit
> 		   operations. This is considerably faster than using
> 		   regexes or Set::Scalar. It would be by far for the best
> 		   if this were also made into a new module (or set of
> 		   modules). Anyone have a good name?
> 		2. A proper means of associating matrices with alignments
> 		   _and with sequences_, and having this be easily
> 		   extensible to associate "this matrix is the best one to
> 		   use in these spots along this sequence, unless the other
> 		   sequence says to use something else" (as in, for
> 		   instance, being able to take a sequence for which the
> 		   structure is known and use different matrices between
> 		   alpha helical regions, beta sheet regions, etcetera, when
> 		   matching/aligning to a homologous sequence for which the
> 		   structure has not been determined). The existing
> 		   Bio::Range stuff doesn't appear to quite match up with
> 		   the requirements for this - one needs to be able to say
> 		   "this matrix should be used for positions
> 		   3-5,7-9,11-13..." (e.g., if there's a partially-buried

You use Bio::Location::Split instead of Bio::Range to represent
non-continuous ranges.

> 		   structure and one is using matrices that vary depending
> 		   on degree of solvent exposure) without getting into an
> 		   insane number of seperate objects. Thoughts?
> 	C. I noted a slight further bug in SimpleAlign, and have put it into
> 	   the bug-tracker (with a patch to solve the problem). The bug is
> 	   that SimpleAlign's consensus_iupac routine, when expanding
> 	   previous IUPAC symbols to the corresponding possible set of NAs,
> 	   wasn't always doing so when necessary (it did the one-to-two set
> 	   of expansions when a regexp matched [SKYWM] when it should have
> 	   been matching [SKYRWM]).
>
> 	-Allen
>
>

-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu