[Biojava-l] equals() method for SymbolList

Keith James kdj@sanger.ac.uk
11 Oct 2002 16:47:08 +0100


>>>>> "Phillip" == Phillip Lord <p.lord@russet.org.uk> writes:

>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:

    Matthew> SymbolList should be behaving like a string over its
    Matthew> symbols. It is silly if it doesn't do this. Hash codes
    Matthew> should realy be calculated in a different (but
    Matthew> sequence-dependant) way to avoid scanning the whole of
    Matthew> very large sequences just to do a hash lookup. Anyone got
    Matthew> any ideas?

    Phillip> Just make the hash out of say the first 10 elements in
    Phillip> the list. The hashcode is not meant to be unique for all
    Phillip> sequences, it's just a performance enhancement. So long
    Phillip> as equals returns false for different sequences, then
    Phillip> there is no problem.

in a similar vein, the array sampling techniques at

http://www273.pair.com/med/columns/Durable6.html

would work, but equals would get called more often for sequences with
similar base composition. How about first 10 and then add in values
for just the indices that are powers of two?

Keith

-- 

- Keith James <kdj@sanger.ac.uk> bioinformatics programming support -
- Pathogen Sequencing Unit, The Wellcome Trust Sanger Institute, UK -