[Bioperl-l] Re: ComparableI stuff
Hilmar Lapp
hlapp at gnf.org
Thu Apr 15 15:31:05 EDT 2004
I was going to write a more detailed response but probably won't be
getting to it before Monday due to painful deadlines. Generally, I have
a number of issues with this.
- On a very general level, basing equality on equal hash keys is
dangerous because it violates the standard definition of a hash key.
You construct it as a string most of the time and hence comparing keys
is meant as a short-cut for comparing objects, but I really would not
call it hash keys and at the same time assume equal objects iff equal
hash keys.
- Defining object equality for complex objects is more a matter of
subjective judgment rather than having objective criteria that we can
define and impose on everybody. In fact, I think doing so is dangerous,
because it creates the false impression of obviating people's need to
make their own appropriate decision on when to call objects equal and
when not to.
As an example, your hashkey() on SeqFeatureI uses the positional
information but leaves out the sequence on which it sits, leaves out
the source_tag(), and in fact leaves out the entire tag system. It also
leaves out a feature's annotation. This definition of equality may be
fine in some cases, but may also be completely inappropriate in others.
In biosql, as an example, it is completely inappropriate; biosql
defines two SeqFeature entries as equal that are on the same sequence,
have the same primary_tag, same source_tag, and are in the same
position in the sequence's feature array. The features' display_name is
irrelevant, as is the positional information. If I compared two
seqfeatures using the ComparableI interface, they may compare as
unequal and yet if I store them I'd get thrown out with a unique key
failure. Chado I believe has a slightly different definition of the
unique key and may take the positional information into account.
As another example, for genbank/embl features it is also inappropriate
because it doesn't test for equality of attached annotations. Note that
you may define equality of two feature table entries in a genbank
record by them containing the same annotation regardless of the
annotations' order of appearance. I.e., comparing the annotation arrays
element by element would be too strict then.
My point here is not that I would urge you to add all these properties
to the SeqFeatureI->diff implementation, because there may be use cases
for which your current implementation is perfectly fine. My point is
rather, I don't see the value of having one definition of equality
implemented in a way that doesn't allow others to coexist when the one
that is implemented is going to serve only a third of all use cases.
I guess one of my key problems is that in fact I don't understand what
the exact use case is. Apart from that, the implementation doesn't seem
to allow for multiple use cases, which will unequivocally result in
different implementations of equality having to peacefully co-exist.
What I could rather envision as being useful is a design along
'schemes', where you can swap in one 'equality scheme' for another
depending on what your needs are, and in which somebody who has a need
for a yet-unimplemented definition could add that implementation and
then swap it in.
-hilmar
On Thursday, April 15, 2004, at 03:02 AM, Peter van Heusden wrote:
> [...]
> Any comments on the ComparableI interface? Can this be checked into
> CVS?
>
> Thanks,
> Peter
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list