[Biojava-l] Partial path probs

Thu Jul 17 10:21:25 EDT 2003

On Thu, Jul 17, 2003 at 01:24:41PM +1200, Schreiber, Mark wrote:
> 
> On a more theoretical note if one calculated the probability of the
> viterbi path and compared that to the forward probability would that be
> a good way of infering confidence in the predictions fit to the model?
> Eg If the Viterbi path prob and the Forward prob were close you could be
> confident that, according to your model, other possible paths are not
> that likely. Alternatively if they are not close you might conclude that
> although the viterbi path is the most parsimonious there could be other
> paths that are almost as likely. Or am I barking up the wrong tree here?

Yes, I think that is a meaningful thing to do.  Unfortunately, if
the probabilities *don't* match, it doesn't give you any clues as
to where the missing probability has "gone".  Is it in a set of
paths which are quite similar to the optimal path, or are there
completely different solutions which are almost as probable as
the optimum?

Ideally, you want an algorithm for sampling from the distribution
of likely paths.  I've never encountered one of these, but I think
there may have been some work done on this.  I know people who
are into protein structure prediction are sometimes interested
in sub-optimal sequence alignments.

It's possible that the variational inference view of HMMs could
help, too.

An alternate approach, which BioJava *will* help you with, is to
calculate both the forward and the backward DP matrices, then
multiply these together and normalize by the overall backwards/forwards
probability.  This then tells you the probability that a given
symbol in the sequence was generated by a given state in the
model, considering all possible paths.  Depending on exactly what
you're trying to do, this might well be the confidence figure
you're looking for.  (This is also the expectation stage of 
Baum-Welch HMM training).

    Thomas.