[Biojava-l] Substitution matices

Matthew Pocock matthew_pocock@yahoo.co.uk
Fri, 10 May 2002 12:20:29 +0100


Hi Mark,

Pairwise alignments are performed using the org.biojava.bio.dp.twohead 
package. The PairwiseDP class extends DP. It is constructed with an HMM 
and a CellCalculatorFactoryMaker. There are two ccf-maker classes - 
DPInterpreter and DPCompiler. The interpreter has an inner-loop rather 
like the loop in the single head DP implementation. DPCompiler uses a 
bytecode generator to build code on the fly that implements the pairwise 
alignment algorithm.

Pairwise alignments are represented by MarkovModel instances with the 
heads property set to two. This means that it expects to consume two 
independant sequences. The advance property of the states will be an 
array of two integers (1 or 0) such that match states have {1,1} and 
insert/delete states will have {1,0} or {0,1} respectively. The 
probablity distributions will be over an alphabet the same as the model, 
which will be the product of the alphabets of the two sequences being 
aligned. To align protein to protein, the model and the distributions 
will be over pXp. Insert/delete states are modeled by using 
GapDistribution and a normal distribution instance (e.g. over protein) 
combined in a PairDistribution.

Take a look at the dp.PairwiseAlignment class in demos to see a simple 
example of smith-waterman-style alignemnts. Do send us further queries. 
This end of the toolkit never got sufficiently documented.

Matthew

ps Backwards scores with the compiler are numericaly unstable - I can't 
find the bug myself (despite spending many hours up to my elbows in the 
bytecode)

Schreiber, Mark wrote:
> Actually this leads on to the issue of how the DP package does a
> pairwise alignment between two protein sequences (or does it)? Does it
> use some sort of PAM or BLOSUM matrix?
> 
> If it does could someone write a demo of it?
> 
> - Mark
>