[Biojava-l] Parameter Settings in BaumWelchTraining]

Fri Mar 12 03:47:50 EST 2004

On Fri, 12 Mar 2004 mark.schreiber at group.novartis.com wrote:

> When you call the train() method of the BaumWelchTrainer you supply it 
> with a SequenceDB. The sequences from this DB are used to optimize the 
> weights of the model.
> 
> However, I have a bad feeling that when you train your model with the 
> BaumWelchTrainer your previously set counts will be ignored and 
> overwritten. You could check by looking into AbstractModelTrainer.train() 
> (which is what the BaumWelchTrainer extends). You could also run some 
> tests to see if using a pre-trained model makes any difference to the 
> final outcome. Does anyone more expert than me on the DP package (ie most 
> people) know if the counts are overwritten?

The idea sounds good either way, so it would be a shame to have to reject
it on the basis of a technicality :)

Cheers

> 
> - Mark
> 
> 
> 
> 
> 
> sacoca at mcb.mcgill.ca
> Sent by: biojava-l-bounces at portal.open-bio.org
> 03/12/2004 01:30 PM
> 
>  
>         To:     sacoca at mcb.mcgill.ca
>         cc:     Biojava Mailing List <biojava-l at biojava.org>
>         Subject:        Re: [Biojava-l] Parameter Settings in BaumWelchTraining]
> 
> 
> Sorry for the previous error.
> ---------------------------- Original Message ----------------------------
> Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining
> From:    sacoca at MCB.McGill.CA
> Date:    Fri, March 12, 2004 12:27 am
> To:      mark.schreiber at group.novartis.com
> --------------------------------------------------------------------------
> 
> Here is the code I have for the training. Using what you told me below, I
> can retreive all of the weights that I calculated manually for the hmm
> (distributions for the transitions and distributions for the alphabet of 
> each state). What I do not understand is how to use this information and
> the sequences stored in a file to run the BaumWelchAlgorithm and then
> retreive the optimized values calculated by the algorithm to set them back
> into my HMM.
> 
> //Retreive the alphabet of all states
> FiniteAlphabet SA = hmm.stateAlphabet();
> Iterator i = SA.iterator();
> 
> SimpleModelTrainer MT = new SimpleModelTrainer();
> MT.registerModel(hmm);
> 
> //go through each state
> while(i.hasNext())
> {Symbol Currentstate = (Symbol)i.next();
> 
>  //Retreive the distribution of all transitions from the current state
> FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate);
>  Distribution d = hmm.getWeights((State)Currentstate);
>  Iterator i2 = From.iterator();
> 
>  //go through it and look at all the weights for each of the transitions
> while(i2.hasNext())
>     {Symbol s = (Symbol)i2.next();
>      System.out.println("From state "+Currentstate.getName()+
>                         "To State   "+s.getName()+
>                          "Weight     "+d.getWeight(s));}
> 
>  //get the distribution for the alphabet of the current state
>  Distribution d2 =((EmissionState)Currentstate).getDistribution();
> FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet();
>  Iterator i3 = IN.iterator();
>  //you can go through it the same way as above using a while loop
> *****************************************************************
> This is what I don't understand!!!!
> *****************************************************************
> here, we have a set of training sequences stored in a file in fasta format
> that i'd like to use with the BaumWelch algorithm to optimize the
> transition distributions mentionned above.
> 
> //This is the file with all the training sequences
> BufferedInputStream is = new BufferedInputStream(new
> FileInputStream("z:/Sequences.faa"));
> 
> //Load the file with the SequenceDB class
> SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet);
> 
> //use 100 cycles as the stop criteria
> StoppingCriteria stopper = new StoppingCriteria()
>      {public boolean isTrainingComplete(TrainingAlgorithm ta)
>        {return (ta.getCycle() > 100);}};
> 
> *****************************************
> This part is what I am clueless about
> *****************************************
> //How do I optimize my hmm with the BaumWelch algorithm and retreive //the
> optimized values ? How do I train the distribution above with //the baum
> welch and the sequences that I have ?
> DP dp= DPFactory.DEFAULT.createDP(hmm);
> BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
> }
> 
> PS : I do not know why you are helping all of us here but thank you. It
> makes Biojava a lot easier to deal with.
> 
> Steve
> 
> > Hi Stephane -
> >
> > Within EmissionState you can set a Distribution that contains emission
> probabilities for the Symbols states emission alphabet using the
> setDistribution method. This Distribution will be your predetermined
> weights.
> >
> > To set the transition probabilities you can use the setWeights(State
> source, Distribution weights). The source is the state you are
> > transitioning from and the weights is the probability of transitioning
> to any State that the source connects too. Because States implement
> Symbol you can put them in a Distribution.
> >
> > To make a Distribution of States that state 'a' could connect to use the
> following pseudo code:
> >
> > State a;
> > Model m;
> > FiniteAlphabet endPoints;
> >
> > endPoints = m.transitionsFrom(a);
> > Distribution d =
> > DistributionFactory.DEFAULT.createDistribution(endPoints);
> >
> > //You can then train d or set it's weights and put it back in the model
> with
> >
> > m.setWeights(a, d);
> >
> > Mark Schreiber
> > Principal Scientist (Bioinformatics)
> >
> > Novartis Institute for Tropical Diseases (NITD)
> > 1 Science Park Road
> > #04-14 The Capricorn, Science Park II
> > Singapore 117528
> >
> > phone +65 6722 2973
> > fax  +65 6722 2910
> >
> >
> >
> >
> >
> > sacoca at mcb.mcgill.ca
> > Sent by: biojava-l-bounces at portal.open-bio.org
> > 03/12/2004 06:11 AM
> >
> >
> >         To:     "Biojava Mailing List" <biojava-l at biojava.org>
> >         cc:
> >         Subject:        [Biojava-l] Parameter Settings in
> > BaumWelchTraining
> >
> >
> > Hi all. I'm trying to optimize the transition states probabilities for
> my HMM. I already have set them to values which I think are pretty good.
> Since I know the Baum Welch can only help with the scores and optimize
> them up to a local maxima I thought of using the parameters I calculated
> as a starting point. The problem is that I don't know how!
> > I followed the example in biojava:
> >
> > ....
> > //train the model to have uniform parameters
> >     ModelTrainer mt = new SimpleModelTrainer();
> >     //register the model to train
> >     mt.registerModel(hmm);
> >
> > I want to use the values already set in my hmm  as the starting
> parameters in the BaumWelch.  I don't want to use the uniform
> distribution as indicated below!
> >
> >     //as no other counts are being used the null weight will cause
> > everything to be uniform
> >     mt.setNullModelWeight(1.0);
> >     mt.train();
> >
> > I tried adding counts and looking up examples on the net but ended up
> more confused than I started. How do I use the addCounts to make this
> work!
> >
> > Stephane Acoca
> > Master's Student
> > McGill Center for Bioinformatics
> >
> > _______________________________________________
> > Biojava-l mailing list  -  Biojava-l at biojava.org
> > http://biojava.org/mailman/listinfo/biojava-l
> >
> >
> >
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
> 
> 
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>