[Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Feb 4 00:16:39 UTC 2011


------- Comment #5 from pgarland at gmail.com  2011-02-03 19:16 EST -------
FWIW, I think the right thing with respect to begin states is to require the
user to explicitly specify an begin state in the state alphabet, e.g.:
class coin:
    def __init__(self):
        self.begin_state_name = "begin"
        self.letters = ["u", "f"]

Having the user specify the name should reduce the chance of naming conflicts,
and makes it easier for the user to understand what is going on if they print
viterbi_probs, or are trying to debug a problem.

The user should also be required to explicitly set the initial probabilities.
There should be three methods for this, one that takes a list of initial
probabilities, one that makes all initial states equally probable, and one that
lets the user set the probability for each state individually. e.g:

MarkovModelBuilder.set_initial_probabilities([0.01, 0.99])
MarkovModelBuilder.set_initial_probability("u", 0.01)

The first and third methods would raise an exception if the sum of the
probabilities did not sum to 1.0

Alternatively, the initial probabilities could be specified when defining the
state alphabet:
    def __init__(self):
        self.begin_state_name = "begin"
        self.letters = [{'name': "u", 'init_prob': 0.01}, {'name': "f",
'init_prob': 0.99}]

This has the advantage of making the code more concise and readable, because
the state's declaration and specification are kept together. It has the
disadvantage adding an unnecessary layer of indirection when all the states
have equal initial probabilities. To make things less tedious for the user,
there could either be a flag specifying that all states have an equal initial

Alternatively, the initial probabilities could be specified when defining the
state alphabet:
    def __init__(self):
        self.begin_state_name = "begin"
        self.initial_probabilties_equal = True
        self.letters = [{'name': "u"}, {'name': "f"}]

or again, a method could be provided:


Because specifying the begin state name and the initial probabilities would be
required, any of these changes would break the current API.

Similar features should be provided for users who want to constrain the end
state, but not specifying the end state should not raise an exception.

I agree the variable names "main_state" and "cur_state" are confusing and
should be changed.


Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

More information about the Biopython-dev mailing list