[Biojava-l] BaumWelchTrainer Broken??!!! (please help)

Todd Riley toddri at eden.rutgers.edu
Mon Nov 21 18:11:18 EST 2005


Thanks for your response.  Yes, I did set some initial weights before 
starting the BW trainer.  I copied the snippet that uses 
SimpleModelTrainer directly from the profileHMM.htm page.

I have compiled and run the code from 
http://www.biojava.org/docs/bj_in_anger/profileHMM.htm and I get the 
same results as with the other demos and my own code (same result = all 
the distributions are all full of NaN's after BW training.)

This code copied directly from the profileHMM.htm page crashes for me 
(see my output below the code).

Thanks for your assistance,
Todd

*******************************************************************
My file that contains the code from the demo profileHMM.htm found in 
"Biojava In Anger" starts here:
*******************************************************************

/*
 * DemoPHMM.java - Directly from 
http://www.biojava.org/docs/bj_in_anger/profileHMM.htm
 *
 */

import java.util.*;
import java.io.BufferedReader;
import java.io.FileOutputStream;
import java.io.PrintStream;
import java.io.FileReader;
import java.io.IOException;
import java.util.StringTokenizer;
import java.io.File;
import javax.swing.JFrame;
import java.awt.event.*;

//import biojava.*;
//import biojava.BaumWelchTrainer;
//import biojava.TrainingAlgorithm;
import org.biojava.bio.*;
import org.biojava.bio.dist.*;
import org.biojava.bio.dp.*;
import org.biojava.bio.seq.*;
import org.biojava.bio.seq.db.*;
import org.biojava.bio.seq.io.*;
import org.biojava.bio.symbol.*;
import org.biojava.utils.*;

public class DemoPHMM {

    public static void main(String[] args) throws IOException {
    DemoPHMM hmm = new DemoPHMM();
    hmm.letsDoThis(args);
    }


    public void letsDoThis(String[] args) throws IOException {
    if (args.length < 1 || args[0].equals("-help") || 
args[0].equals("-?")) {
        System.out.println("\n Usage: DemoPHMM <Fasta-Training-Set-File>");
        System.exit(-1);
    }

    String trainingSet=args[0];

    try {
        /*
         * Make a profile HMM over the DNA Alphabet with 12 'columns' 
and default
         * DistributionFactories to construct the transition and emmission
         * Distributions
         */
        ProfileHMM hmm = new ProfileHMM(DNATools.getDNA(),
                        20,
                        DistributionFactory.DEFAULT,
                        DistributionFactory.DEFAULT,
                        "my profilehmm");

        //create the Dynamic Programming matrix for the model.
        DP dp = DPFactory.DEFAULT.createDP(hmm);

        //Database to hold the training set
        //SequenceDB db = new HashSequenceDB();
        //code here to load the training set
        SequenceDB db = 
IOUtility.readSequenceDB(trainingSet,DNATools.getDNA());

        //train the model to have uniform parameters
        ModelTrainer mt = new SimpleModelTrainer();
        //register the model to train
        mt.registerModel(hmm);
        //as no other counts are being used the null weight will cause 
everything to be uniform
        mt.setNullModelWeight(1.0);
        mt.train();

        //create a BW trainer for the dp matrix generated from the HMM
        BaumWelchTrainer bwt = new BaumWelchTrainer(dp);

        //anonymous implementation of the stopping criteria interface to 
stop after 20 iterations
        StoppingCriteria stopper = new StoppingCriteria(){
            public boolean isTrainingComplete(TrainingAlgorithm ta){
            System.out.println("\t\tCycle: " + ta.getCycle() + " score: 
" + ta.getCurrentScore() + " " + (ta.getCurrentScore() - 
ta.getLastScore()) );
            return (ta.getCycle() > 20);
            }
        };
   
        /*
         * optimize the dp matrix to reflect the training set in db 
using a null model
         * weight of 1.0 and the Stopping criteria defined above.
         */
        bwt.train(db,1.0,stopper);

        //SymbolList test = null;
        //code here to initialize the test sequence
        Sequence test = 
DNATools.createDNASequence("tacaGAACATGTCTAAGCATGCTGggga", "mySeq");
   
        /*
         * put the test sequence in an array, an array is used because 
for pairwise
         * alignments using an HMM there would need to be two 
SymbolLists in the
         * array
         */
   
        SymbolList[] sla = {(SymbolList)test};
   
        //decode the most likely state path and produce an 'odds' score
        StatePath path = dp.viterbi(sla, ScoreType.ODDS);
        System.out.println("Log Odds = "+path.getScore());

        //print state path
        for(int i = 1; i <= path.length(); i++){
        System.out.println(path.symbolAt(StatePath.STATES, i).getName());
        }
    }
    catch (Exception ex) {
            ex.printStackTrace();
            //System.err.println("symbol is "+symbol);
            //System.err.println("distribution is 
"+StringUtility.distributionToString(emissionDist));
            System.exit(-1);
    }

    }

}

*******************************************************************
My output from running this code above starts here:
*******************************************************************
        Cycle: 1 score: -1105.9598698420707 -Infinity
        Cycle: 2 score: -1000.3026011513825 105.65726869068817
        Cycle: 3 score: NaN NaN
        Cycle: 4 score: NaN NaN
        Cycle: 5 score: NaN NaN
        Cycle: 6 score: NaN NaN
        Cycle: 7 score: NaN NaN
        Cycle: 8 score: NaN NaN
        Cycle: 9 score: NaN NaN
        Cycle: 10 score: NaN NaN
        Cycle: 11 score: NaN NaN
        Cycle: 12 score: NaN NaN
        Cycle: 13 score: NaN NaN
        Cycle: 14 score: NaN NaN
        Cycle: 15 score: NaN NaN
        Cycle: 16 score: NaN NaN
        Cycle: 17 score: NaN NaN
        Cycle: 18 score: NaN NaN
        Cycle: 19 score: NaN NaN
        Cycle: 20 score: NaN NaN
        Cycle: 21 score: NaN NaN
java.lang.NullPointerException
    at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:650)
    at org.biojava.bio.dp.onehead.SingleDP.viterbi(SingleDP.java:513)
    at DemoPHMM.letsDoThis(DemoPHMM.java:103)
    at DemoPHMM.main(DemoPHMM.java:33)

*******************************************************************
My fasta training sequence file here:
*******************************************************************
 >Funk_Sequence_1
GGACATGCCCGGGCATGTT
 >Funk_Sequence_2
GAACATGCCCGGGCATGTCT
 >Funk_Sequence_3
GGACATGCCCGGGCATGTCG
 >Funk_Sequence_4
GGGCATGCCCGGGCATGTCT
 >Funk_Sequence_5
GAACATGCCCGGGCATGTCC
 >Funk_Sequence_6
AAACATGCCCGGGCATGTTC
 >Funk_Sequence_7
GGACATGCCCGGGCATGTCT
 >Funk_Sequence_8
GGACATGCCCGGGCATGTCG
 >Funk_Sequence_9
AAACATGCCCGGGCATGCCC
 >Funk_Sequence_10
GGGCATGCCCGGGCATGTTC
 >Funk_Sequence_11
AGACATGCCCGGGCATGTCT
 >Funk_Sequence_12
GGACATGCCCGGGCATGTCT
 >Funk_Sequence_13
GGACATGCCCGGGCATGCCC
 >Funk_Sequence_14
GGACATGTCCGGACATGTTC
 >Funk_Sequence_15
GGACATGTCCGGACATGTCT
 >Funk_Sequence_16
AAACATGTCCGGGCATGTCC
 >Funk_Sequence_17
GGACATGTCCGGGCATGTCT

 >ElnDeiry_Sequence_1
GGGCCTGTCACAGCATGCCT
 >ElnDeiry_Sequence_2
CTGCATGTCTAGGCAAGTCA
 >ElnDeiry_Sequence_3
AAACATGCCCAGACTTGTCT
 >ElnDeiry_Sequence_4
AGGCATGCCTTTGCCT
 >ElnDeiry_Sequence_5
GGGCATGTTTAGGCAAGCTT
 >ElnDeiry_Sequence_6
AGACATGTTATAACAAGTCA
 >ElnDeiry_Sequence_7
TGACATGTCCCGACGTGTTT
 >ElnDeiry_Sequence_8
AGGCATGTTCGGGCTGTCT
 >ElnDeiry_Sequence_9
TGACTTGCCTTGACATGTTC
 >ElnDeiry_Sequence_10
CAGCTGCCAAGGCATGCAG
 >ElnDeiry_Sequence_11
CAACTTGTCTGGACATGTTC
 >ElnDeiry_Sequence_12
AGACAAGCCTGGGCAGGTCC
 >ElnDeiry_Sequence_13
AAACAAGCCCGGATGTGCCC
 >ElnDeiry_Sequence_14
ACACTTGTCTATACCTGCCT
 >ElnDeiry_Sequence_15
AAACATGCTTTGACATGTTC
 >ElnDeiry_Sequence_16
GGACTTGCCCTGGCCAGCCC
 >ElnDeiry_Sequence_17
AGGTTTGCCGGGCTTGTTC
 >ElnDeiry_Sequence_18
TGACTTGCCCAGACATGTTT
 >ElnDeiry_Sequence_19
AAGCATGCCTTGACTTGTTC
 >ElnDeiry_Sequence_20
TGCCTTGCCTGGACTTGCCT


mark.schreiber at novartis.com wrote:

>Can you try the code in 
>http://www.biojava.org/docs/bj_in_anger/profileHMM.htm
>
>I have found in the past that you need to set some intial weights before 
>starting the BW trainer. If this example doesn't work please repost to the 
>list.
>
>- Mark
>
>
>
>  
>


More information about the Biojava-l mailing list