[Biojava-l] HMM's - Attempting some fancy stuff
Todd Riley
toddri at eden.rutgers.edu
Thu Mar 23 21:59:23 UTC 2006
Hello,
After successfully implementing some TFBS search models using the
ProfileHMM and DP classes, I am ready to attempt some fancier stuff that
is going to require some serious coding. Before I begin, I thought that
I might field some questions to the BioJava users/programmers that have
some experience and/or interest in the BioJava HMM classes. I want to
be sure to implement features in a fashion that will maximize usability
in the simplest way....
Questions:
1. Many of the TFBS sites that I am modeling are palindromic or
repetitive. I wish to associate transition and emission distributions
(as prior knowledge) during training in order to enforce a palindromic
and/or repetitive pattern and thus also greatly reduce the parameter space.
Example: A p53 TFBS is palindromic and repetitive. A 20 column Profile
HMM can be greatly reduced to an HMM with a the match-state topology of
1 2 3 4 5 C(5) C(4) C(3) C(2) C(1) 1 2 3 4 5 C(5) C(4) C(3) C(2) C(1),
where C() means DNA complement. Notice that with this model, I now have
only 5 match-state emissions as opposed to 20 to train. (C(n) is a
complement view over distribution n). There are also far fewer
transition distributions to train if I impose that the transitions from
a->b are the same as b->a or C(b)->C(a), but in the opposite direction.
I wish to implement this in a fashion that does not require any changes
to the current Viterbi, forward, Baum Welch, etc, algorithms, or the DP
class.
I have already started writing classes that provide a view (or
complement view) over an existing distribution. My plan is to use these
views as a means to correlate emission and transition distributions from
and between different columns in the Profile HMM.
Has anyone ever tried this or thought of trying this? Any ideas about
how to implement this could be very useful.
2. I wish to use more complicated background models than just a 0-th
order background distribution. I would like to use a Dirichlet mixture
and/or higher order Markov models. Has anyone looked into this? Any
ideas as to how to implement this in the current release?
-Todd
More information about the Biojava-l
mailing list