[Biojava-l] Generalized HMM in biojava?

wendy wong wendy.wong at gmail.com
Fri Jan 20 17:11:58 EST 2006


Thanks for your help!

> It's not the alphabet that will kill you, but the number of parameters you are
> estimating. Indeed, BioJava should be able to handle alphabets with more than
> 2^32 symbols quite happily. There's an implementation of cross-product
> alphabet designed especially for this case.

what I am trying to do is to develop a phylogenetic HMM. so say there
are 3 sequences, in the alignment, that means each site consists of 3
symbols, and if it is a generalized HMM, each state has several sites,
say 7. I wrote a testing program to see if it works. when the length
of sites in the state = 5 it worked. (I just want to see if I can
factorize a symbol in the state alphabet. but when number of sites in
the state = 7, I get  java.lang.ArrayIndexOutOfBoundsException.  (code
attached)

Is it because i was not using the alphabet efficiently?

again, thanks very much for helping!

Wendy

public static void main(String[] args) throws MarshalException,
ValidationException, IOException {
		
		Alphabet sequenceAlphabet = DNATools.getDNA();
		Set alphabetSet = AlphabetManager.getAllSymbols((FiniteAlphabet)
sequenceAlphabet);
		
	    	int no_sequences = 3;
		List siteAlphabetList = Collections.nCopies(no_sequences, sequenceAlphabet);
	    Alphabet siteAlphabet =
AlphabetManager.getCrossProductAlphabet(siteAlphabetList);
	    int length = 7;
	    List staeAlphabetList = Collections.nCopies(length, siteAlphabet);
	    Alphabet stateAlphabet =
AlphabetManager.getCrossProductAlphabet(staeAlphabetList);
	
	    AlphabetIndex alphabetIndex =
AlphabetManager.getAlphabetIndex((FiniteAlphabet) stateAlphabet);
	AtomicSymbol sym = (AtomicSymbol) alphabetIndex.symbolForIndex(3);
	    List symList = sym.getSymbols();
	    log.info("sym (index=3)  is " + sym);
	    log.info("sym is composed of:");
	    Iterator symIter = symList.iterator();
	    while (symIter.hasNext()) {
	    		log.info(symIter.next());
	    }
}



More information about the Biojava-l mailing list