[Biojava-l] Problems with Baum-Welch training

Fri Aug 8 15:57:45 EDT 2003

Hi,

I construct a profile HMM with randomized uniform distributions. I then train 
the model and, regardless of the training set I use, it ends up looking like 
this:
- a 0.999 probability to go from one delete state to the next
- roughly a uniform emission distribution for emission states (insert and match)
- about a 50% prob. to go from an insert state to a delete state and a 25% 
prob. to go to a match or insert state

The result of aligning a sequence to a trained model is as follows:

------------------- + original sequence

In other words, any sequence is rejected by the model as being produced by it 
(the sequence will go though all the delete states in the model and emit 
the whole sequence in the final insert state).

The raw sequences I use should align in a sensible way and are based on an 
existing seed alignment. Has anyone encountered a similar problem? Is there 
something obvious that I may be doing wrong?

Many thanks,

Henry