[Bioperl-l] Hidden Markov Model in Bioperl?
Aaron J. Mackey
amackey at pcbi.upenn.edu
Mon Mar 28 08:11:33 EST 2005
Yes, in bioperl-ext, of course ...
On Mar 25, 2005, at 6:49 PM, Yee Man Chan wrote:
> I am thinking of an interface like this:
>
> Bio::Tools::HMM->new("symbols", "states")
> - instantiate an HMM object with a string of symbols (each character
> corresponds to one symbol) and a string of states. Other parameters of
> the
> model is generated randomly. Good for starting a Baum-Welch training.
Why not expand this to be two arrayrefs of symbols or states? You can
convert them into whatever encoded single-char alphabet you'd like.
Think Perl, not C. This is a feature request, not a requirement, of
course.
> Bio::Tools::HMM->ObsSeqProb("string of observed sequence")
> - return the probability of an observed sequence.
This is the Forward algorithm P()? Perhaps an alias to Forward(), and
the ability to specify an offset/index at which you want the Forward
value (see below)? Or is this the product of viterbi factors?
> Bio::Tools::HMM->Viterbi("string of observed sequence")
> - return a string of hidden sequence that maximize the probability of
> the
> happening of the observed sequence.
this might also return the P() of the viterbi path; and again, instead
of returning string of symbols, an arrayref of symbols.
> Bio::Tools::HMM->getInitArray()
> Bio::Tools::HMM->getStateMatrix()
> Bio::Tools::HMM->getEmissionMatrix()
Presumably these should be get/set methods?
What's missing is 1) posterior decoding and 2) partial path probability
(i.e. F_{i}*v_{i+1}*v+{i+2}*...v*_{j-1}*B_{j}/F_{x}, where i < j, F and
B are Forward and Backward values, v's are viterbi factors for each
step in the partial path specified from i to j)
I'd also prefer lower case names (BaumWelch could just be called
"train" or "learn_unsupervised" or somesuch)
Also, see the HMM functions available in Matlab that do the same ...
Good luck,
-Aaron
--
Aaron J. Mackey, Ph.D.
Dept. of Biology, Goddard 212
University of Pennsylvania email: amackey at pcbi.upenn.edu
415 S. University Avenue office: 215-898-1205
Philadelphia, PA 19104-6017 fax: 215-746-6697
More information about the Bioperl-l
mailing list