[Biojava-l] Packed DNA Symbol List
David Huen
smh1008@cus.cam.ac.uk
Mon, 11 Feb 2002 14:13:52 +0000 (GMT)
On Mon, 11 Feb 2002, Matthew Pocock wrote:
> Cool David! Have you got any stats about the relative performance of
> the raw and packed implementations? The issue with AlphabetIndex and
> ambiguities is my fault. I wrote the imlementations not to index
> ambiguities. What do you use an indexer for? I'm happy for you to commit
> away. Thomas? Others?
>
I'm using the indexer to convert symbols into 4-bit values that form the
array.
Length 200000 symbols. Athlon MP (real 1200 MHz, whatever the BogoHertz
are).
For converting a SimpleSymbolList via constructor into:-
SimpleSymbolList 79 ms.
PackedDNASymbolList 29 ms. (why is this faster than the above????)
For reading thru' 200000 symbols sequentially,
SimpleSymbolList 4 ms.
PackedDNASymbolList 15 ms. (this is more expected but I expected it to
be even worse than this!).
I have tried reorganising the alphabet index to make the common symbols
come first but that seems to have a negligible impact on on performance
compared to having a bit-to-base mapping. I'm a bit surprised by this - I
suppose it just means that symbol lookup is not a major factor. OTOH,
computing the element of the array to look up is major as replacing a
division and a modulo with two bit ops doubled the performance.
There would be better performance if I could use an entity bigger than a
byte but JDBCs seem to like byte arrays and I'd like to be able to
export/import the associated byte arrays to databases readily.
Regards,
David Huen