[Biojava-l] BaumWelchTrainer Broken??!!! (please help)

mark.schreiber at novartis.com mark.schreiber at novartis.com
Mon Nov 21 02:43:32 EST 2005


Can you try the code in 
http://www.biojava.org/docs/bj_in_anger/profileHMM.htm

I have found in the past that you need to set some intial weights before 
starting the BW trainer. If this example doesn't work please repost to the 
list.

- Mark





Todd Riley <toddri at eden.rutgers.edu>
Sent by: biojava-l-bounces at portal.open-bio.org
11/21/2005 03:13 PM

 
        To:     biojava-l at biojava.org
        cc:     (bcc: Mark Schreiber/GP/Novartis)
        Subject:        [Biojava-l] BaumWelchTrainer Broken??!!!  (please help)


I have built an Profile HMM.  I hand trained it (setting the emission 
and transition distributions by hand) and was able to generate nice 
viterbi scores of fasta sequences.  However, when I tried to perform 
Expectation Maximization using the BaumWelchTrainer and a training set, 
things did not go well at all.  After the iterations are done, all of 
the emission and transition distributions of the now trained model are 
all full of NaN's!!!  (Needless to say, viterbi scoring is now 
impossible. Any attempt to do so generates a NullPointerException on 
line 650 of SingleDP.java in the SingleDP.viterbi() method.)

I looked into the mail archives and found that Fabian Schreiber had the 
exact same problem when he wrote a BaumWelchTrainer program exactly like 
the one from Biojava in Anger: "How do I make a ProfileHMM?".  His 
message is from March 25th of this year (with no replies).

I then decided to download the BioJava 1.4 sources and found 2 
additional (dp) demos that use the BaumWelchTrainer:
    demos/dp/PatternFinder.java
    demos/dp/SearchProfile.java

I compiled and ran both of these demos and found very discouraging 
results.  The iteration scores quickly go to NaN, no matter what 
sequences I train on (including the demos/dp/fake.fasta file).

Is there something that I am missing here?  Is the BaumWelchTrainer 
broken?  Why are all the emission and transition distributions now full 
of all NaN's after training?

Any insight or investigation here would be greatly appreciated.

Thanks,
Todd Riley

I am re-posting Fabian Schreiber's code because it is shorter than 
mine......

//Create Markov Modell - The method createCasino generates an Alphabet 
and sets //the probabilities for the transitions and emissions
MarkovModel casino = createCasino();

DP dp=DPFactory.DEFAULT.createDP(casino);


BaumWelchTrainer bwtrainer = new BaumWelchTrainer(dp);


SequenceDB seqDB = new HashSequenceDB("hashdb");
// here the DB is filled with the sequences --> this works

//Set the stopper
  StoppingCriteria stopper= new StoppingCriteria()
             {public boolean isTrainingComplete(TrainingAlgorithm ta)
             {return (ta.getCycle() > 10);}};
//Train the modell
bwtrainer.train(seqDB, 1.0, stopper);


_______________________________________________
Biojava-l mailing list  -  Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l





More information about the Biojava-l mailing list