[Biojava-l] HMM's - Attempting some fancy stuff

mark.schreiber at novartis.com mark.schreiber at novartis.com
Fri Mar 24 02:28:04 UTC 2006


I think you could do a palindrome as a push-down automaton or similar. 
Alternatively you could do something like a HMM with emission duration as 
in Borodovsky's GeneMarkHMM programs but that would require a lot of new 
code for the DP library (good to have though).

To use a Dirichlet mixture as your background you could calculate one and give it to a Distribution 
although it might be best to implement the Distribution interface with a 
class that generates one for you. To go to higer order models you just 
need a higher order alphabet 
(http://biojava.org/wiki/BioJava:Cookbook:Alphabets:CrossProduct) and 
possibly use an OrderNDistribution for background and emission 
(http://biojava.org/wiki/BioJava:CookBook:Distribution:Custom)

- Mark





Todd Riley <toddri at eden.rutgers.edu>
Sent by: biojava-l-bounces at lists.open-bio.org
03/24/2006 07:04 AM

 
        To:     Francois Pepin <fpepin at aei.ca>
        cc:     biojava-l at biojava.org, Mark Schreiber/GP/Novartis at PH
        Subject:        Re: [Biojava-l] HMM's - Attempting some fancy stuff


Yes, I agree that the palindromes are not always identical.  However, 
often my unaligned training data is not complete enough to train the 
model well without some simplification.  So far, I have been using 
Cross-validation, sensitivity, and specificity to determine the 
effectiveness of this simplification approach.

-Todd

Francois Pepin wrote:

>>1. Many of the TFBS sites that I am modeling are palindromic or 
>>repetitive.  I wish to associate transition and emission distributions 
>>(as prior knowledge) during training in order to enforce a palindromic 
>>and/or repetitive pattern and thus also greatly reduce the parameter 
space.
>> 
>>
>
>Just as a note, we haven't found this to be ideal, if you have
>sufficient training data. It is often the case that one of the
>palindromes is more conserved than the other, and you would treating
>them the same way.
>
>Of course, it depends how much of an in-depth study you'll want to be
>doing.
>
>Francois
>
> 
>

_______________________________________________
Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/biojava-l






More information about the Biojava-l mailing list