[Biopython-dev] [Bug 2947] Bio.HMM calculates wrong viterbi path

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Sun Feb 6 03:23:39 UTC 2011


http://bugzilla.open-bio.org/show_bug.cgi?id=2947





------- Comment #12 from pgarland at gmail.com  2011-02-05 22:23 EST -------
(In reply to comment #11)
> Sounds good.
> 
> (I had been thinking about trying to preserve backward compatibility for
> existing clients of this class. If we require that the caller sets a begin
> state then all existing clients will break since none of them currently does
> that. But the previous fix has already broken compatibility in any case, and
> that was probably necessary since prior to the fix, the results were
> incorrect.)

I don't think it's worth it to worry about preserving complete backward
compatibility. Right now there are two classes of code:

1) Code that manually sets up a begin state and the appropriate transitions.
All these people would need to do is add one line of code specifying the begin
state, and the rest of their code would work as before. For these users, we
could print an error message instructing them to set the begin_state_name
variable (and document the change too!).

2) Code that does not set up a begin state, as in the bug report. Even with the
applied bug fix, this code only returns a correct state sequence when all
possible start states should be equally probable. In all other cases the users
are possibly getting an incorrect result without being aware of it. To my mind,
this is worse than breaking backward compatibility. We could maintain backward
compatibility by having a default model for the initial state (e.g. equally
probable, or assign random probabilities), but unless that's the model the user
should be assuming for their sequence, they'll still be silently returned an
incorrect result.

> A possible variation would be to handle the transition from the begin state to
> the first real state with special-case code, so that the begin state would not
> be included in the set of real states. The upside would be that the methods you
> mention would not have to change, and we wouldn't be cluttering the state
> alphabet with a begin state that isn't real, which I think was a concern
> mentioned in comment #6 (if I understood it properly). The downside is having
> to add that special-case code. Not sure yet whether this is a good idea or not. 
> 
> Walter

I hadn't thought of that approach. It could be a good way to go. I think the
tradeoffs would be:

A) Of the existing code, changes would be localized to the viterbi method,
which would become slightly more complex.
B) This approach makes it trivial to guarantee that no state can transition to
the begin state.
C) One new public method would have to be added, for users to set initial
probabilities.
D) Having to use the new method would require more, though not complex, changes
to existing user code, but would have the benefit of making it as explicit as
possible how the model is initialized.

All in all, your idea of keeping the begin state separate looks like the way to
go.

~ Phillip

> (In reply to comment #10)
> > (In reply to comment #8)
> > > I'll volunteer to do all of that (OK with you, Phillip?).
> > > 
> > > Walter
> > 
> > Sure. WRT my earlier comment, I realized that it's simpler for both the
> > implementer and the user if the only user-visible change necessary to specify
> > begin states is to add a variable to HiddenMarkovBuilder to hold the name of
> > the begin state, and then let users use set_transition_score to specify
> > transition probabilities from begin states. Then the relevant methods, e.g.
> > _all_blank, allow_transition, allow_all_transitions, set_transition_score, etc
> > have to be altered to forbid transitions to, or emissions from the begin state.
> > And get_markov_model would raise an exception if a begin state hasn't been
> > specified or if there isn't at least one transition from the begin state. 
> > 
> > So all users would have to do is (using the example from the bug report):
> > 
> > ...
> > build.begin_state_name = "begin"
> > build.set_transition_score("begin", "u", 0.01)
> > build.set_transition_score("begin", "f", 0.99)
> > ...
> > 
> > ~Phillip
> > 
> 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list