[Biojava-l] HMM's - Attempting some fancy stuff

Todd Riley toddri at eden.rutgers.edu
Thu Mar 23 21:59:23 UTC 2006


Hello,

After successfully implementing some TFBS search models using the 
ProfileHMM and DP classes, I am ready to attempt some fancier stuff that 
is going to require some serious coding.  Before I begin, I thought that 
I might field some questions to the BioJava users/programmers that have 
some experience and/or interest in the BioJava HMM classes.  I want to 
be sure to implement features in a fashion that will maximize usability 
in the simplest way....

Questions:

1. Many of the TFBS sites that I am modeling are palindromic or 
repetitive.  I wish to associate transition and emission distributions 
(as prior knowledge) during training in order to enforce a palindromic 
and/or repetitive pattern and thus also greatly reduce the parameter space.

Example: A p53 TFBS is palindromic and repetitive.  A 20 column Profile 
HMM can be greatly reduced to an HMM with a the match-state topology of 
1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1), 
where C() means DNA complement.  Notice that with this model, I now have 
only 5 match-state emissions as opposed to 20 to train.  (C(n) is a 
complement view over distribution n).  There are also far fewer 
transition distributions to train if I impose that the transitions from 
a->b are the same as b->a or C(b)->C(a), but in the opposite direction.

I wish to implement this in a fashion that does not require any changes 
to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP 
class.

I have already started writing classes that provide a view (or 
complement view) over an existing distribution.  My plan is to use these 
views as a means to correlate emission and transition distributions from 
and between different columns in the Profile HMM.

Has anyone ever tried this or thought of trying this?  Any ideas about 
how to implement this could be very useful.

2.  I wish to use more complicated background models than just a 0-th 
order background distribution.  I would like to use a Dirichlet mixture 
and/or higher order Markov models.  Has anyone looked into this?  Any 
ideas as to how to implement this in the current release?

-Todd





More information about the Biojava-l mailing list