[Biojava-l] RE: Bug in HashedAlphabetIndex??

Matthew Pocock mrp@sanger.ac.uk
Wed, 07 Mar 2001 12:24:54 +0000


Hi Mark,

This looks like something I should sort out.

If you build an alphabet that represents the cross-product of other 
aphabets, its size is obviously the product of the sizes of each 
alphabet you combine. This can get very large, esp for alignments 
(protein^10 = 20^10 = 1.024e13 symbols). In effect, this is the same 
issue that makes alignment algorithms computationaly expensive for 
aligning any reasonable number of sequences symultaneously to each 
other. Obviously, we can't be expected to hold this number of objects in 
memory, so there are some optimized implementation of FiniteAlphabet 
that attempt to make symbols 'appear' when needed, and vanish when 
discarded. There is obviously something up in the magic.

I'll get back to you when it's fixed.

Matthew

Schreiber, Mark wrote:

> Actually after loooking at the debugger I find that the Finite alphabet
> produced by the statement
> 
>   //create a cross product of N dna alphabets
>   FiniteAlphabet nOrderAlpha =
> (FiniteAlphabet)AlphabetManager.getCrossProductAlphabet(
>  
> Collections.nCopies(order.intValue(),DNATools.getDNA())
>                          );
> 
> is very different depending on the value returned by order.intValue() If it
> is 3 then a shiney happy SimpleCrossProduct object is returned if it is
> larger than 4 a SparseCrossProduct object is returned??
> 
> Is this a "feature"??
> 
> Mark