[Biojava-l] Partial path probs
Thomas Down
thomas at derkholm.net
Fri Jul 18 00:50:09 EDT 2003
Once upon a time, Schreiber, Mark wrote:
> > An alternate approach, which BioJava *will* help you with, is
> > to calculate both the forward and the backward DP matrices,
> > then multiply these together and normalize by the overall
> > backwards/forwards probability. This then tells you the
>
> By normalize do you mean divide each value by the total forward or
> backward probability? So by doing this I would expect that the
> forward-backward prob should equal the forward and the backward probs?
The value of each cell in the forward * backward matrix is the
probability of observing the data given that a particular symbol
was emitted by a particular state. The overall forward score
is simply the probability of the data given any path through
the model. So the ratio gives the probability that the `true'
path passed through a particular cell (assuming a uniform
prior, if you want to be nicely Bayesian).
> Also, can someone enlighten me how the matrices are indexed (there is
> not much javadoc in DP)? Currently I'm assuming that the x axis is the
> array returned by getStates and the y axis is the aligned SymbolList
> from that state path, with a magical state at each end but I'm not too
> clear on this.
I'm not sure what you mean by x and y here. When you do
forwards/backwards, you don't have any state path to use
as a frame of reference, since you're summing over all
possible state paths. The indices into the array are just
indices into the SymbolList (and yes, positions 0 and length()+1
are magic symbols which only match the model's magical state).
For `gaps' (i.e. paths which pass through states which don't
emit anything in the sequence you're using as a frame of
reference), you'll find the scores at the index of the previous
symbol which *was* emitted. So long as your model doesn't have
any cycles of states which don't emit any symbols, this should
always be unambiguous.
For the 1-head DP case, there's a nice example of all this
in BaumWelchTrainer.java. There isn't anything equivalent
for 2-head, but the principles should be closely analogous.
Thomas.
More information about the Biojava-l
mailing list