[Biojava-l] Higer order HMMs

Matthew Pocock mrp@sanger.ac.uk
Tue, 23 Jan 2001 13:26:48 +0000


Hi Mark,

We are now entering the twilight zone...

Mark Schreiber wrote:

> Hi,
>
> If I want a HMM to emit hexamers as in a gene finding HMM do I just create
> a new hexamer alphabet and add that to the state or can it be mimicked by
> the way that the transitions are set??

The BioJava HMM toolkit only supports 1st order probabilities over
transitions. That is, the probability of reaching a state is only conditional
upon the 1st previous state. Other scheims would require a history of states
visited to be maintained, and this is potentialy quite expensive. This effect
can be simulated by producing multiple n'th order states (wow - that's scary)
that project the n'th order state-space into a 1st order one. I haven't yet
found any compelling reason to write the projection code.

Originaly, the advance array on emission states would have dealt with the
hexamer issue for you. However, we very quickly discovered (after reading the
durbin-eddy-..... dp book) that you can handle this by building an HMM that
emits hexamers. If you want to look at the sequence as a list of hexamers
(non-overlapping) then you have the emission state Distributions emit over the
hexamer alphabet, and you build the hexamer lists using
SymbolListViews.windowedSymbolList(). If you want all overlapping hexamers,
you use SymbolListViews.orderNSymbolList() and OrderNDistribution to model the
conditional probability of the 6th nucleotide of each hexamer conditional on
the previous five.

We have tried to make sure that any model can always be represented in the 1st
order form so that the DP engine can be relatively simple. I think that it
also turns out that some sets of probabilities can be cached more eficiently
in 1st order form.

> Also if I want a different states to emit pentamers and some to emit
> triplets can they be combined in the same model using different alphabets?

Here you define the model in terms of pentamers, and some order 5 Distribution
instances are degenerate for the first two nucleotides.

>
>
> Mark
>

There may be some models that can't be represented quickly in this way, but I
think that all can be transformed algorithmicaly. If the pent/trip model is
more than an idle curiosity, then we can knock up an HMM implementation that
1st orderises any arbitrary model.

Matthew

>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Mark Schreiber                  Ph: 64 3 4797875
> Rm 218                          email mark_s@sanger.otago.ac.nz
> Department of Biochemistry      email m.schreiber@clear.net.nz
> University of Otago
> PO Box 56
> Dunedin
> New Zealand
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org
> http://biojava.org/mailman/listinfo/biojava-l