[Biojava-dev] Feature filtering changes?

Thomas Down td2@sanger.ac.uk
Fri, 11 Oct 2002 11:59:39 +0100


Hi...

I'm considering making a few changes and additions to the
way features are queried.  The main changes are:

1. Add a method on SequenceDB:

     public FeatureHolder filter(FeatureFilter ff);

This is equivalent to applying ff to all sequence in the database
and merging the results.  It's already implemented for
BioSQLSequenceDB.  I'll add a naive implementation (using
sequenceIterator) on AbstractSequenceDB.

This is really useful for cases like finding a gene by ID in
a whole genome database.  It's a small step in moving us
away from an entirely sequence-centric view of annotation.


2. Add a new single argument filter method on FeatureHolder,
   without the `recurse' flag.  This flag was a first attempt
   at providing an interface for searching hierarchical features,
   but it's not been entirely successful.  In particular, if you
   call `filter' again on a FeatureHolder which was created by
   a filter operation, you're likely to end up with duplicated
   features.  I'd prefer to see filter operations constrained
   using the FeatureFilter grammer itself.

   The filter(FeatureFilter, boolean) method won't be removed
   (since it's widely used), but will eventually be deprecated.

3. Add an extra standard FeatureFilter implementation.

       public static class FeatureFilter.IsTopLevel;

   Non-recursive searches can then be emulated with:

        myseq.filter(new FeatureFilter.And(
                             FeatureFilter.ByType("foo"),
                             FeatureFilter.IsTopLevel()
                                          )
                    );


Any objections to these?  They should have minimal impact on
existing code, but should prop up the current query system
until BioJava2 is ready.

     Thomas.