[Bioperl-l] Re: ComparableI stuff

Thu Apr 15 15:31:05 EDT 2004

I was going to write a more detailed response but probably won't be 
getting to it before Monday due to painful deadlines. Generally, I have 
a number of issues with this.

	- On a very general level, basing equality on equal hash keys is 
dangerous because it violates the standard definition of a hash key. 
You construct it as a string most of the time and hence comparing keys 
is meant as a short-cut for comparing objects, but I really would not 
call it hash keys and at the same time assume equal objects iff equal 
hash keys.

	- Defining object equality for complex objects is more a matter of 
subjective judgment rather than having objective criteria that we can 
define and impose on everybody. In fact, I think doing so is dangerous, 
because it creates the false impression of obviating people's need to 
make their own appropriate decision on when to call objects equal and 
when not to.

As an example, your hashkey() on SeqFeatureI uses the positional 
information but leaves out the sequence on which it sits, leaves out 
the source_tag(), and in fact leaves out the entire tag system. It also 
leaves out a feature's annotation. This definition of equality may be 
fine in some cases, but may also be completely inappropriate in others.

In biosql, as an example, it is completely inappropriate; biosql 
defines two SeqFeature entries as equal that are on the same sequence, 
have the same primary_tag, same source_tag, and are in the same 
position in the sequence's feature array. The features' display_name is 
irrelevant, as is the positional information. If I compared two 
seqfeatures using the ComparableI interface, they may compare as 
unequal and yet if I store them I'd get thrown out with a unique key 
failure. Chado I believe has a slightly different definition of the 
unique key and may take the positional information into account.

As another example, for genbank/embl features it is also inappropriate 
because it doesn't test for equality of attached annotations. Note that 
you may define equality of two feature table entries in a genbank 
record by them containing the same annotation regardless of the 
annotations' order of appearance. I.e., comparing the annotation arrays 
element by element would be too strict then.

My point here is not that I would urge you to add all these properties 
to the SeqFeatureI->diff implementation, because there may be use cases 
for which your current implementation is perfectly fine. My point is 
rather, I don't see the value of having one definition of equality 
implemented in a way that doesn't allow others to coexist when the one 
that is implemented is going to serve only a third of all use cases.

I guess one of my key problems is that in fact I don't understand what 
the exact use case is. Apart from that, the implementation doesn't seem 
to allow for multiple use cases, which will unequivocally result in 
different implementations of equality having to peacefully co-exist.

What I could rather envision as being useful is a design along 
'schemes', where you can swap in one 'equality scheme' for another 
depending on what your needs are, and in which somebody who has a need 
for a yet-unimplemented definition could add that implementation and 
then swap it in.

	-hilmar

On Thursday, April 15, 2004, at 03:02  AM, Peter van Heusden wrote:

> [...]
> Any comments on the ComparableI interface? Can this be checked into 
> CVS?
>
> Thanks,
> Peter
>
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------