[Biojava-l] High Order HMM

Denis Yuen mucous at gmail.com
Fri Dec 22 20:26:10 UTC 2006


Hi,

New to HMMs and BioJava, so what I'm asking for is probably a dumb question.
But I figure it better to ask it rather than sit here and be puzzled...

>From the wiki article
http://www.biojava.org/wiki/BioJava:Tutorial:Dynamic_programming_examples
and the post http://portal.open-bio.org/pipermail/biojava-l/2006-March/005387.html

I get the sense that in order to create a third-order HMM, reading a
protein sequence, and emitting symbols (e.g. create an alphabet
TriGreek from "alpha","beta","delta"), you would need to create one
state for each amino acid, and associate each state with a
OrderNDistribution using a cross product alphabet as in
AlphabetManager.generateCrossProductAlphaFromName("(Protein x Protein
x TriGreek)").

So if you walked through a trimer AGF which emitted "alpha", you would
end in the state "F", which uses a OrderNDistribution where the first
protein (in the cross product alphabet) corresponds to the "A", the
second protein corresponds to the "G", and the last term corresponds
to "alpha."

This seems odd, so what I don't get, is should I be mixing emissions
with previous states in the cross product alphabet to create a third
order HMM? Or is there a better way?

I'm even more confused about how to define transition weights.

Obviously, I'm wrong about something...  How do you define
states/distributions in a third order HMM?

Thanks



More information about the Biojava-l mailing list