[Biojava-l] equals() method for SymbolList

Phillip Lord p.lord@russet.org.uk
11 Oct 2002 16:16:09 +0100


>>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:

  Matthew> SymbolList should be behaving like a string over its
  Matthew> symbols. It is silly if it doesn't do this. Hash codes
  Matthew> should realy be calculated in a different (but
  Matthew> sequence-dependant) way to avoid scanning the whole of very
  Matthew> large sequences just to do a hash lookup. Anyone got any
  Matthew> ideas?

Just make the hash out of say the first 10 elements in the list. The
hashcode is not meant to be unique for all sequences, it's just a
performance enhancement. So long as equals returns false for different
sequences, then there is no problem.

It does mean that if you add lot of very similar sequences to a hash,
particularly if they are similar in the first bit of the sequence then
you will get lots of collisions in the hash, which will mean a pretty
drastic performance knock. Also the collisions will mean equals will
get called a lot, which will be more of a problem if you have long
sequences and an equals method based on the sequence. You could get
round this by taking elements from different places in the list, based
on their total length for instance.

It's also worth mentioning that if you define hashcode and equals in
this way, put sequence objects into a hash, and then change the
sequence, both methods will return different results. This can't
happen with string because its immutable. Things will probably start
going wrong at this point.

On the whole, my inclination would be that a sequence based
hashcode/equals is something which should only happen when its what
the programmer really wants, not by default. Which is what the
toList() call does.

Anyway I thought I would share my random thoughts with you as its
friday afternoon, and my brain is fried. I shall go back to list
lurking now!

Cheers

Phil


-- 
Phillip Lord,				Phone: +44 (0) 161 275 6138
PostDoctoral Research Associate,        Email: p.lord@russet.org.uk
Department of Computer Science          http://www.russet.org.uk  
Kilburn Building                        http://www.cs.man.ac.uk/~phillord
University of Manchester                
Oxford Road
Manchester
M13 9PL