[Biojava-l] Parameter Settings in BaumWelchTraining]

sacoca at MCB.McGill.CA sacoca at MCB.McGill.CA
Fri Mar 12 00:30:16 EST 2004


Sorry for the previous error.
---------------------------- Original Message ----------------------------
Subject: Re: [Biojava-l] Parameter Settings in BaumWelchTraining
From:    sacoca at MCB.McGill.CA
Date:    Fri, March 12, 2004 12:27 am
To:      mark.schreiber at group.novartis.com
--------------------------------------------------------------------------

Here is the code I have for the training. Using what you told me below, I
can retreive all of the weights that I calculated manually for the hmm
(distributions for the transitions and distributions for the alphabet of  
each state). What I do not understand is how to use this information and
the sequences stored in a file to run the BaumWelchAlgorithm and then
retreive the optimized values calculated by the algorithm to set them back
into my HMM.

//Retreive the alphabet of all states
FiniteAlphabet SA = hmm.stateAlphabet();
Iterator i = SA.iterator();

SimpleModelTrainer MT = new SimpleModelTrainer();
MT.registerModel(hmm);

//go through each state
while(i.hasNext())
{Symbol Currentstate = (Symbol)i.next();

 //Retreive the distribution of all transitions from the current state
FiniteAlphabet From = hmm.transitionsFrom((State)Currentstate);
 Distribution d = hmm.getWeights((State)Currentstate);
 Iterator i2 = From.iterator();

 //go through it and look at all the weights for each of the transitions
while(i2.hasNext())
    {Symbol s = (Symbol)i2.next();
     System.out.println("From state "+Currentstate.getName()+
                        "To State   "+s.getName()+
                         "Weight     "+d.getWeight(s));}

 //get the distribution for the alphabet of the current state
 Distribution d2 =((EmissionState)Currentstate).getDistribution();
FiniteAlphabet IN = (FiniteAlphabet)hmm.emissionAlphabet();
 Iterator i3 = IN.iterator();
 //you can go through it the same way as above using a while loop
*****************************************************************
This is what I don't understand!!!!
*****************************************************************
here, we have a set of training sequences stored in a file in fasta format
that i'd like to use with the BaumWelch algorithm to optimize the
transition distributions mentionned above.

//This is the file with all the training sequences
BufferedInputStream is = new BufferedInputStream(new
FileInputStream("z:/Sequences.faa"));

//Load the file with the SequenceDB class
SequenceDB DB = SeqIOTools.readFasta(is, ProtAlphabet);

//use 100 cycles as the stop criteria
StoppingCriteria stopper = new StoppingCriteria()
     {public boolean isTrainingComplete(TrainingAlgorithm ta)
       {return (ta.getCycle() > 100);}};

*****************************************
This part is what I am clueless about
*****************************************
//How do I optimize my hmm with the BaumWelch algorithm and retreive //the
optimized values ? How do I train the distribution above with //the baum
welch and the sequences that I have ?
DP dp= DPFactory.DEFAULT.createDP(hmm);
BaumWelchTrainer bwt = new BaumWelchTrainer(dp);
}

PS : I do not know why you are helping all of us here but thank you. It
makes Biojava a lot easier to deal with.

Steve

> Hi Stephane -
>
> Within EmissionState you can set a Distribution that contains emission
probabilities for the Symbols states emission alphabet using the
setDistribution method. This Distribution will be your predetermined
weights.
>
> To set the transition probabilities you can use the setWeights(State
source, Distribution weights). The source is the state you are
> transitioning from and the weights is the probability of transitioning
to any State that the source connects too. Because States implement
Symbol you can put them in a Distribution.
>
> To make a Distribution of States that state 'a' could connect to use the
following pseudo code:
>
> State a;
> Model m;
> FiniteAlphabet endPoints;
>
> endPoints = m.transitionsFrom(a);
> Distribution d =
> DistributionFactory.DEFAULT.createDistribution(endPoints);
>
> //You can then train d or set it's weights and put it back in the model
with
>
> m.setWeights(a, d);
>
> Mark Schreiber
> Principal Scientist (Bioinformatics)
>
> Novartis Institute for Tropical Diseases (NITD)
> 1 Science Park Road
> #04-14 The Capricorn, Science Park II
> Singapore 117528
>
> phone +65 6722 2973
> fax  +65 6722 2910
>
>
>
>
>
> sacoca at mcb.mcgill.ca
> Sent by: biojava-l-bounces at portal.open-bio.org
> 03/12/2004 06:11 AM
>
>
>         To:     "Biojava Mailing List" <biojava-l at biojava.org>
>         cc:
>         Subject:        [Biojava-l] Parameter Settings in
> BaumWelchTraining
>
>
> Hi all. I'm trying to optimize the transition states probabilities for
my HMM. I already have set them to values which I think are pretty good.
Since I know the Baum Welch can only help with the scores and optimize
them up to a local maxima I thought of using the parameters I calculated
as a starting point. The problem is that I don't know how!
> I followed the example in biojava:
>
> ....
> //train the model to have uniform parameters
>     ModelTrainer mt = new SimpleModelTrainer();
>     //register the model to train
>     mt.registerModel(hmm);
>
> I want to use the values already set in my hmm  as the starting
parameters in the BaumWelch.  I don't want to use the uniform
distribution as indicated below!
>
>     //as no other counts are being used the null weight will cause
> everything to be uniform
>     mt.setNullModelWeight(1.0);
>     mt.train();
>
> I tried adding counts and looking up examples on the net but ended up
more confused than I started. How do I use the addCounts to make this
work!
>
> Stephane Acoca
> Master's Student
> McGill Center for Bioinformatics
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>
>
>





More information about the Biojava-l mailing list