[Biojava-l] BaumWelchTrainer Broken??!!! (please help)
Todd Riley
toddri at eden.rutgers.edu
Mon Nov 21 02:13:52 EST 2005
I have built an Profile HMM. I hand trained it (setting the emission
and transition distributions by hand) and was able to generate nice
viterbi scores of fasta sequences. However, when I tried to perform
Expectation Maximization using the BaumWelchTrainer and a training set,
things did not go well at all. After the iterations are done, all of
the emission and transition distributions of the now trained model are
all full of NaN's!!! (Needless to say, viterbi scoring is now
impossible. Any attempt to do so generates a NullPointerException on
line 650 of SingleDP.java in the SingleDP.viterbi() method.)
I looked into the mail archives and found that Fabian Schreiber had the
exact same problem when he wrote a BaumWelchTrainer program exactly like
the one from Biojava in Anger: "How do I make a ProfileHMM?". His
message is from March 25th of this year (with no replies).
I then decided to download the BioJava 1.4 sources and found 2
additional (dp) demos that use the BaumWelchTrainer:
demos/dp/PatternFinder.java
demos/dp/SearchProfile.java
I compiled and ran both of these demos and found very discouraging
results. The iteration scores quickly go to NaN, no matter what
sequences I train on (including the demos/dp/fake.fasta file).
Is there something that I am missing here? Is the BaumWelchTrainer
broken? Why are all the emission and transition distributions now full
of all NaN's after training?
Any insight or investigation here would be greatly appreciated.
Thanks,
Todd Riley
I am re-posting Fabian Schreiber's code because it is shorter than
mine......
//Create Markov Modell - The method createCasino generates an Alphabet
and sets //the probabilities for the transitions and emissions
MarkovModel casino = createCasino();
DP dp=DPFactory.DEFAULT.createDP(casino);
BaumWelchTrainer bwtrainer = new BaumWelchTrainer(dp);
SequenceDB seqDB = new HashSequenceDB("hashdb");
// here the DB is filled with the sequences --> this works
//Set the stopper
StoppingCriteria stopper= new StoppingCriteria()
{public boolean isTrainingComplete(TrainingAlgorithm ta)
{return (ta.getCycle() > 10);}};
//Train the modell
bwtrainer.train(seqDB, 1.0, stopper);
More information about the Biojava-l
mailing list