[Biojava-dev] FeatureFilter optimizer

Matthew Pocock matthew_pocock@yahoo.co.uk
Mon, 11 Nov 2002 22:43:15 +0000


Hi,

It's been a marathon struggle against gargantuan odds, but I've done it. 
FilterUtils has been re-vamped. It has many methods for conveniently 
making filters and comparing them. Also, there is a very funkey method 
called optimize() that takes a filter and tries to return a filter that 
is equivalent to it but has fewer terms.

Why is this usefull, I hear you ask? Well, I'm going to tell you even if 
you didn't care.

Imagine we have 3 pots of features. One is a pot of Exons, one is a pot 
of Repeats and one is a pot of Snips. Using a MergingFeatureHolder, we 
can make these appear to be a single FeatureHolder so that they all 
appear to be siblings on the same Sequence. The individual pots may be 
backed by DAS, BioSQL and an in-memory collection respectively. When we 
filter the combined view, we realy want to be careful about sending of 
queries to pots that are guaranteed to return an empty set. For example, 
if I queried this combined collection for Exons, then we don't realy 
want to send an SQL query off attempting to pull Exons out of an sql 
database or scan 100,000 in-memory snips just in case one is an exon.

Using the optimize method, the MergingFeatureHolder can look at the 
membership filter of each pot in turn, construct and(membership, query) 
and optimize that. If it comes out as the empty filter none() then it 
knows not to actualy dispatch the filter query to that pot. It's 
guaranteed to return nothing at all. Clever, eh?

This even lets us do more magical things. We can filter by allowable 
annotations. If I were to filter a merged view of ensembl features by 
transcript.id, the merging layer can see which pots of features can have 
a transcript.id, and thus drill down to the FeatureHolder representing 
the ensembl transcript table, and avoid the others. This would be done 
by having each FeatureHolder representing an Ensembl features table 
publish a FeatureFilter.ByAnnotationType() that says what properties 
(IDs and otherwise) it provides and stating that that is all it can 
provide. If some types of features could have an Xref property, we could 
filter by the presence of that and expect sensible things to happen.

This code was started on Saturday and finnished a few moments ago. There 
is an extensive jUnit test for it, but I by no means guarantee that it 
works in all cases. If you are interested at all in feature filters or 
in query languages or in ontologies, please take this code for a spin 
and try to break it, and see what realy fun things it can do.

Matthew

-- 
BioJava Consulting LTD - Support and training for BioJava
http://www.biojava.co.uk

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com