[Biojava-l] HMM

Andreas Draeger andreas.draeger at uni-tuebingen.de
Sun Jan 28 22:01:10 UTC 2007


I do have a question regarding HMMs. I created a custom HMM following  
the Dice example on the web site  
It works fine and I can ether generate sequences or the corresponding  
state path. However, I would like to train the model and to get the  
probabilities that a certain sequence was produced by this model. I  
tried the following:

try {
       DP dp = DPFactory.DEFAULT.createDP(createMyModel());
       StatePath obs_rolls = dp.generate(4);
       SymbolList roll_sequence = obs_rolls
       SymbolList[] res_array = { roll_sequence };
       StatePath v = dp.viterbi(res_array, ScoreType.PROBABILITY);

       BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
       StoppingCriteria sc = new StoppingCriteria() {
         public boolean isTrainingComplete(TrainingAlgorithm arg0) {
           if (arg0.getCycle() > 100)
           //if (Math.abs(arg0.getLastScore() - arg0.getCurrentScore()) < 0.5)
             return true;
           return false;

       try {
         BufferedReader br = new BufferedReader(new FileReader(args[0]));
         SequenceDB db = new HashSequenceDB();
         myAlphabet.putTokenization("token", new  
NameTokenization(myAlphabet, true));
         while (br.ready()) {
           String line = br.readLine();
           SymbolList sym = new  
SimpleSymbolList(myAlphabet.getTokenization("token"), line);
           db.addSequence(new SimpleSequence(sym, "",  
line.replaceAll(" ", ""), Annotation.EMPTY_ANNOTATION));
         bwt.train(db, 0.1, sc);
         for (Iterator i=db.ids().iterator(); i.hasNext(); ) {
           Sequence seq = db.getSequence(i.next().toString());
               bwt.getDP().forward(new SymbolList[] {seq},  
       } catch (FileNotFoundException e) {
       } catch (IOException e) {
       } catch (ChangeVetoException e) {

       SymbolList realstates = obs_rolls.symbolListForLabel(StatePath.STATES);
       SymbolList realsymbols =  
       SymbolList states = v.symbolListForLabel(StatePath.STATES);
       SymbolList symbols = v.symbolListForLabel(StatePath.SEQUENCE);// */

       System.out.println("Output:\t" + realsymbols.seqString());
       System.out.println("Position:\t" + realstates.seqString());
       System.out.println("Probability:\t" + dp.forward(new  
SymbolList[] {realsymbols}, ScoreType.PROBABILITY));

     } catch (IllegalArgumentException e) {
     } catch (BioException e) {

In createMyModel() I create my costum model, which is a modified  
version of the aforementioned example.
When I comment the line bwt.train(db, 0.1, sc); the output of the line

System.out.println("Probability:\t" + dp.forward(new SymbolList[]  
{realsymbols}, ScoreType.PROBABILITY));

will give negative probabilies like

Probability:	-5.851716517873089

otherwise (when I use the BaumWelchTrainer) the probabilities will  
even be NaN.

What is the meaning of this? Why are the probabilities not between 0  
and 1 and why does the BaumWelchTrainer produce NaN values?
So my question is: how can I get the probability that the HMM emitts a  
given sequence and how can I train the HMM properly?

I appreciate every answer!


Dipl.-Bioinform. Andreas Dräger
Eberhard Karls University Tübingen
Center for Bioinformatics (ZBIT)
Sand 1
72076 Tübingen

Phone: +49-7071-29-70436
Fax:   +49-7071-29-5091

More information about the Biojava-l mailing list