[Biojava-l] Partial path probs

Thomas Down thomas at derkholm.net
Fri Jul 18 00:50:09 EDT 2003


Once upon a time, Schreiber, Mark wrote:
> > An alternate approach, which BioJava *will* help you with, is 
> > to calculate both the forward and the backward DP matrices, 
> > then multiply these together and normalize by the overall 
> > backwards/forwards probability.  This then tells you the 
> 
> By normalize do you mean divide each value by the total forward or
> backward probability? So by doing this I would expect that the
> forward-backward prob should equal the forward and the backward probs?

The value of each cell in the forward * backward matrix is the
probability of observing the data given that a particular symbol
was emitted by a particular state.  The overall forward score
is simply the probability of the data given any path through
the model.  So the ratio gives the probability that the `true'
path passed through a particular cell (assuming a uniform
prior, if you want to be nicely Bayesian).

> Also, can someone enlighten me how the matrices are indexed (there is
> not much javadoc in DP)? Currently I'm assuming that the x axis is the
> array returned by getStates and the y axis is the aligned SymbolList
> from that state path, with a magical state at each end but I'm not too
> clear on this.

I'm not sure what you mean by x and y here.  When you do
forwards/backwards, you don't have any state path to use
as a frame of reference, since you're summing over all
possible state paths.  The indices into the array are just
indices into the SymbolList (and yes, positions 0 and length()+1
are magic symbols which only match the model's magical state).
For `gaps' (i.e. paths which pass through states which don't
emit anything in the sequence you're using as a frame of
reference), you'll find the scores at the index of the previous
symbol which *was* emitted.  So long as your model doesn't have
any cycles of states which don't emit any symbols, this should
always be unambiguous.

For the 1-head DP case, there's a nice example of all this
in BaumWelchTrainer.java.  There isn't anything equivalent
for 2-head, but the principles should be closely analogous.


     Thomas.


More information about the Biojava-l mailing list