[Bioperl-l] Re: No joins
Ewan Birney
birney@ebi.ac.uk
Thu, 15 Aug 2002 17:36:06 +0100 (BST)
Brian -
I would certainly agree with you that joins are bad, and in fact Bioperl
originally had a heirarchical feature only system and joins implicitly
went into these cases.
However as more people used it being able to store and process 100% of
EMBL/GenBank became a priority, and we bolted on the location stuff -
location stuff was really driven in by the fuzzies (aaaah, the fuzzies)
which are distinctly hard to handle inside heirarichal features (what does
biojava do with the fuzzies?) but most fuzzies are also joins, (in fact
alot of joins have fuzzy ends) so... it became the defacto way to handle
joins.
Of course the frustrating thing is that noone *can* use the fuzzies but
the semantic interpretation of fuzzies is just... impossible to remain
cosnsistent across more than 2 records. Fuzzies are for human warm-fuzzy
feelings that the data format is representing everything they know and is
just a semantic mire for computers.
I agree it gives us so much semantic rope to hang ourselves with it is
scary. But there is not an obvious ideal solution:
- somehow represent all things inside hierarchial features, including
the fuzzies (brain-ache)
- not handle 100% of Genbank (means a large number of uses cases fail)
If there is something obvious I am missing here, shout, but this is
somewhere between rock-and-hard-place in my experience.
Practical question - what does BioJava do with the Fuzzies?
-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>.
-----------------------------------------------------------------