[Biojava-dev] Introducing feature-schemas

Thomas Down td2@sanger.ac.uk
Sun, 24 Nov 2002 23:07:06 +0000


I've just checked in a patch when introduces a simple `feature
schema' mechanism for BioJava.  This touches quite a lot of
files, but the good news is that the impact should be relatively
minor at first -- the only people who *have* to pay attention
to this are  FeatureHolder implementors, and people who directly
use manipulate MergeFeatureHolder objects.

A feature schema is simply a FeatureFilter which provides an
`upper bound' on a set of features.  In the past, these have
already been used in various ad hoc ways.  For example,
MergeFeatureHolder had a method (now removed) for specifying
a `membership filter' on a sub-FeatureHolder.

The new approach involves one new method on the FeatureHolder
interface:

     public FeatureFilter getSchema();

This returns a FeatureFilter which will accept all top level
Features in the FeatureHolder.  It is also possible to give
information about their child features.  For example:

    new FeatureFilter.And(
        new FeatureFilter.ByType("transcript"),
        new FeatureFilter.OnlyChildren(
            new FeatureFilter.And(
                new FeatureFilter.Or(
                    new FeatureFilter.ByType("exon"),
                    new FeatureFilter.ByType("translation")
                ),
                FeatureFilter.leaf
            )
        )
    );

This schema indicate that:

   - All top level features have type "transcript"
   - Transcripts may have child features of type "exon" or "translation"
   - There are no grandchild features.

It is, of course, valid to return the non-informative schema,
FeatureFilter.all.

There are a number of possible uses for schemas.  The primary
reason for their existence is query-optimization.  It is possible
(using the FilterUtils.areDisjoint method) to compare a given
query FeatureFilter against a schema, and potentially prove that
this query will return an empty set.  Other possible applications
include introspecting the available feature types (for display
to the user).

All FeatureHolder implementations in biojava-live should now
return valid schema information, although it is not always
as restrictive as it could be.

Let me know if there are any problems with this,

     Thomas.