[Biojava-dev] ModelInState fixed?

David Huen smh1008 at cus.cam.ac.uk
Tue Apr 15 13:17:28 EDT 2003


On Tuesday 15 Apr 2003 12:05 pm, Matthew Pocock wrote:

>
> Now for the next one - states that emit more than one
> symbol. At the moment, one state emits 1 symbol at a
> time. This makes the code simple. It sucks for things
> like aligning dna to protein as the DNA inserts want
> to be nucleotides but the dna-protein matches want to
> be codons. This can be fixed. The advance arrays don't
> need to contain values of just 0 or 1 - they could for
> example be 3. This has a knock-on for the emission
> alphabet in that now it emits both nucleotides and
> trinucleotides, but that's fixable. To make this work,
> we need to update the DP cursors to be aware that they
> have to store more than just the last one column.
>
The emission distribution would have to be over a compound alphabet too, 
e.g. (DNA x Protein) or of more interest to me ((DNAxDNAxDNA) 
x(DNAxDNAxDNA)).  Under these circumstances, the alphabet of the model 
needs to take on board the possibility that the state alphabet may be of a 
higher order than the model alphabet.

Regards,
David Huen



More information about the biojava-dev mailing list