[Biojava-l] RE: Bug in HashedAlphabetIndex??

Matthew Pocock mrp@sanger.ac.uk
Thu, 08 Mar 2001 11:07:35 +0000


Schreiber, Mark wrote:

>> -----Original Message-----
>> From: Matthew Pocock [mailto:mrp@sanger.ac.uk]
>> Sent: Thursday, March 08, 2001 5:00 AM
>> To: Schreiber, Mark
>> Cc: 'biojava-l@biojava.org'
>> Subject: Re: [Biojava-l] RE: Bug in HashedAlphabetIndex??
>> 
>> 
>> Hi Mark,
>> 
>> I've fixed this on the main trunk. Thomas, could you port this to the 
>> 1.1 branch?
>> 
> 
> 
> Great, seems to work now. What was the problem??
> 

We had originaly coppied the implementation of the iterator method 
directly from SimpleCrossProductAlphabet which stores symbols in a Map 
and uses map.vaules().iterator(). SparseCrossProductAlphabet populates a 
Map as symbols are required, spreading the initialization cost, and also 
for simple cases (like alignments), vastly reducing the number of 
symbols actualy instantiated. This meant that the symbols iterator from 
an un-populated alphabet didn't iterate over all symbols, just the ones 
that had been explicitly asked for. This is now fixed by providing a 
niftey implementation of Iterator - see the source code for more 
details, but the same trick can be used to build alphabet indexers for 
large alphabets (any takers?).

M