[Biojava-l] BaumWelchTrainer Broken??!!! (please help)
mark.schreiber at novartis.com
mark.schreiber at novartis.com
Mon Nov 21 02:43:32 EST 2005
Can you try the code in
http://www.biojava.org/docs/bj_in_anger/profileHMM.htm
I have found in the past that you need to set some intial weights before
starting the BW trainer. If this example doesn't work please repost to the
list.
- Mark
Todd Riley <toddri at eden.rutgers.edu>
Sent by: biojava-l-bounces at portal.open-bio.org
11/21/2005 03:13 PM
To: biojava-l at biojava.org
cc: (bcc: Mark Schreiber/GP/Novartis)
Subject: [Biojava-l] BaumWelchTrainer Broken??!!! (please help)
I have built an Profile HMM. I hand trained it (setting the emission
and transition distributions by hand) and was able to generate nice
viterbi scores of fasta sequences. However, when I tried to perform
Expectation Maximization using the BaumWelchTrainer and a training set,
things did not go well at all. After the iterations are done, all of
the emission and transition distributions of the now trained model are
all full of NaN's!!! (Needless to say, viterbi scoring is now
impossible. Any attempt to do so generates a NullPointerException on
line 650 of SingleDP.java in the SingleDP.viterbi() method.)
I looked into the mail archives and found that Fabian Schreiber had the
exact same problem when he wrote a BaumWelchTrainer program exactly like
the one from Biojava in Anger: "How do I make a ProfileHMM?". His
message is from March 25th of this year (with no replies).
I then decided to download the BioJava 1.4 sources and found 2
additional (dp) demos that use the BaumWelchTrainer:
demos/dp/PatternFinder.java
demos/dp/SearchProfile.java
I compiled and ran both of these demos and found very discouraging
results. The iteration scores quickly go to NaN, no matter what
sequences I train on (including the demos/dp/fake.fasta file).
Is there something that I am missing here? Is the BaumWelchTrainer
broken? Why are all the emission and transition distributions now full
of all NaN's after training?
Any insight or investigation here would be greatly appreciated.
Thanks,
Todd Riley
I am re-posting Fabian Schreiber's code because it is shorter than
mine......
//Create Markov Modell - The method createCasino generates an Alphabet
and sets //the probabilities for the transitions and emissions
MarkovModel casino = createCasino();
DP dp=DPFactory.DEFAULT.createDP(casino);
BaumWelchTrainer bwtrainer = new BaumWelchTrainer(dp);
SequenceDB seqDB = new HashSequenceDB("hashdb");
// here the DB is filled with the sequences --> this works
//Set the stopper
StoppingCriteria stopper= new StoppingCriteria()
{public boolean isTrainingComplete(TrainingAlgorithm ta)
{return (ta.getCycle() > 10);}};
//Train the modell
bwtrainer.train(seqDB, 1.0, stopper);
_______________________________________________
Biojava-l mailing list - Biojava-l at biojava.org
http://biojava.org/mailman/listinfo/biojava-l
More information about the Biojava-l
mailing list