[Biojava-l] equals() method for SymbolList

Schreiber, Mark mark.schreiber@agresearch.co.nz
Mon, 14 Oct 2002 09:02:25 +1300

Actually AbstractSymbolList (from which all the SymbolLists inherit) on
BioJava live contains a logical equals() method and a Hashcode method.
The hashcode method may not be the most efficient.

- Mark

> -----Original Message-----
> From: Phillip Lord [mailto:p.lord@russet.org.uk] 
> Sent: Saturday, 12 October 2002 5:01 a.m.
> To: biojava-l@biojava.org
> Subject: Re: [Biojava-l] equals() method for SymbolList
> >>>>> "Keith" == Keith James <kdj@sanger.ac.uk> writes:
> >>>>> "Phillip" == Phillip Lord <p.lord@russet.org.uk> writes:
> >>>>> "Matthew" == Matthew Pocock <matthew_pocock@yahoo.co.uk> writes:
>   Matthew> SymbolList should be behaving like a string over its
>   Matthew> symbols. It is silly if it doesn't do this. Hash codes
>   Matthew> should realy be calculated in a different (but
>   Matthew> sequence-dependant) way to avoid scanning the whole of very
>   Matthew> large sequences just to do a hash lookup. Anyone got any
>   Matthew> ideas?
>   Phillip> Just make the hash out of say the first 10 elements in the
>   Phillip> list. The hashcode is not meant to be unique for all
>   Phillip> sequences, it's just a performance enhancement. So long as
>   Phillip> equals returns false for different sequences, then there is
>   Phillip> no problem.
>   Keith> in a similar vein, the array sampling techniques at
>   Keith> http://www273.pair.com/med/columns/Durable6.html
>   Keith> would work, but equals would get called more often for
>   Keith> sequences with similar base composition. How about first 10
>   Keith> and then add in values for just the indices that are powers
>   Keith> of two?
> Probably be a good idea to factor in the length of the 
> Alphabet as well. If there are only a few symbols you get 
> much more chance of a collision because there are only unique 
> values for the elements.
> You will still get problems though if the sequence underneath 
> changes, while you are using it as a hash key.
> Right, I really am going back to lurking now. 
> Phil
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.