[Bioperl-l] Re: No joins

Ewan Birney birney@ebi.ac.uk
Thu, 15 Aug 2002 17:36:06 +0100 (BST)


Brian -


I would certainly agree with you that joins are bad, and in fact Bioperl 
originally had a heirarchical feature only system and joins implicitly 
went into these cases.



However as more people used it being able to store and process 100% of 
EMBL/GenBank became a priority, and we bolted on the location stuff - 
location stuff was really driven in by the fuzzies (aaaah, the fuzzies) 
which are distinctly hard to handle inside heirarichal features (what does 
biojava do with the fuzzies?) but most fuzzies are also joins, (in fact 
alot of joins have fuzzy ends) so... it became the defacto way to handle 
joins.


Of course the frustrating thing is that noone *can* use the fuzzies but 
the semantic interpretation of fuzzies is just... impossible to remain 
cosnsistent across more than 2 records. Fuzzies are for human warm-fuzzy 
feelings that the data format is representing everything they know and is 
just a semantic mire for computers.



I agree it gives us so much semantic rope to hang ourselves with it is 
scary. But there is not an obvious ideal solution:

   - somehow represent all things inside hierarchial features, including 
the fuzzies (brain-ache)

   - not handle 100% of Genbank (means a large number of uses cases fail)



If there is something obvious I am missing here, shout, but this is 
somewhere between rock-and-hard-place in my experience.



Practical question - what does BioJava do with the Fuzzies?










-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------