[Biojava-l] Generalized HMM in biojava?

Matthew Pocock matthew.pocock at ncl.ac.uk
Mon Jan 23 06:58:41 EST 2006


On Monday 23 January 2006 11:43, wendy wong wrote:
> > OK - so you have a single HMM that emits whole columns of an alignment?
> > Usually to a lign three sequences, you would use a 3-head HMM where each
> > head emits one of the sequences.
>
> I am not sure if it would work with a 3 head HMM, as in here the
> sequences are related to each other by the phylogenetic tree. so if
> the sequences order is the same, the column ACC would have a different
> likelihood than CCA.

So you already have the alignment from a phylogenetic program and you are 
using biojava to compute some other statistic over it?

>
> > You shouldn't be getting exceptions. This is almost certainly a bug.
> > Could you send the stack-trace?
>
> sure, here it is:

Thanks. I am not arround untill the end of the week. Could somebody take a 
look at this?

> Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
> 	at
> org.biojava.bio.symbol.LinearAlphabetIndex.buildIndex(LinearAlphabetIndex.j
>ava:108) at
> org.biojava.bio.symbol.LinearAlphabetIndex.<init>(LinearAlphabetIndex.java:
>66) at
> org.biojava.bio.symbol.AlphabetManager.getAlphabetIndex(AlphabetManager.jav
>a:1796) at
> edu.cornell.bscb.evopromoter.TestingFunctions.main(TestingFunctions.java:61
>)
>
> I think I don't need the full alphabet of getDNA(), which has 16
> symbols. I reduced it to 5 (A,T, C, G, N), so I can have a state that
> contains more sites...

While this is a good idea, it actually will be counter-productive in BioJava. 
The DNA alphabet only has 4 'real' symbols - the nucleotides. The other 
symbols (n included) are 'virtual' symbols constructed from sets of the 
'real' symbols. By introducing 'N' as a 1st class symbol, you have actually 
grown the problem from being exp(4,n) to exp(5,n) which is probably not what 
you wanted :-)

>
> thanks,
> wendy

Matthew


More information about the Biojava-l mailing list