[Biojava-dev] FeatureFilter optimizer
Matthew Pocock
matthew_pocock@yahoo.co.uk
Mon, 11 Nov 2002 22:43:15 +0000
Hi,
It's been a marathon struggle against gargantuan odds, but I've done it.
FilterUtils has been re-vamped. It has many methods for conveniently
making filters and comparing them. Also, there is a very funkey method
called optimize() that takes a filter and tries to return a filter that
is equivalent to it but has fewer terms.
Why is this usefull, I hear you ask? Well, I'm going to tell you even if
you didn't care.
Imagine we have 3 pots of features. One is a pot of Exons, one is a pot
of Repeats and one is a pot of Snips. Using a MergingFeatureHolder, we
can make these appear to be a single FeatureHolder so that they all
appear to be siblings on the same Sequence. The individual pots may be
backed by DAS, BioSQL and an in-memory collection respectively. When we
filter the combined view, we realy want to be careful about sending of
queries to pots that are guaranteed to return an empty set. For example,
if I queried this combined collection for Exons, then we don't realy
want to send an SQL query off attempting to pull Exons out of an sql
database or scan 100,000 in-memory snips just in case one is an exon.
Using the optimize method, the MergingFeatureHolder can look at the
membership filter of each pot in turn, construct and(membership, query)
and optimize that. If it comes out as the empty filter none() then it
knows not to actualy dispatch the filter query to that pot. It's
guaranteed to return nothing at all. Clever, eh?
This even lets us do more magical things. We can filter by allowable
annotations. If I were to filter a merged view of ensembl features by
transcript.id, the merging layer can see which pots of features can have
a transcript.id, and thus drill down to the FeatureHolder representing
the ensembl transcript table, and avoid the others. This would be done
by having each FeatureHolder representing an Ensembl features table
publish a FeatureFilter.ByAnnotationType() that says what properties
(IDs and otherwise) it provides and stating that that is all it can
provide. If some types of features could have an Xref property, we could
filter by the presence of that and expect sensible things to happen.
This code was started on Saturday and finnished a few moments ago. There
is an extensive jUnit test for it, but I by no means guarantee that it
works in all cases. If you are interested at all in feature filters or
in query languages or in ontologies, please take this code for a spin
and try to break it, and see what realy fun things it can do.
Matthew
--
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk
__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com