[Biojava-dev] Feature interface change

Thomas Down td2@sanger.ac.uk
Thu, 22 Aug 2002 17:12:28 +0100


Hi...

Here are a couple more thoughts on the FeatureCard/FeatureMapping
interfaces...

I'm wondering if it's worth allowing one FeatureCard to be a
specialization of another.  This could be applied as follows in
the case of repeat features:

   - One FeatureCard for all intergral repeats
   - One FeatureCard for all `Alu' repeats.
   - One FeatureCard for `AluJo' repeats.
   - (Optionally) a FeatureCard for one specific copy of AluJo
     for which an annotator has noted some interesting feature.

I guess having a hierarchy like this implies that there should
probably be a `blank' FeatureCard which is the root of the
specialization hierarchy.
     
In this scheme, it may not be practical to have the double-binding
between FeatureCards and FeatureMappings.  If a FeatureMapping
kept a reference to its most specific FeatureCard, to go from
card to mappings, you could do something like:

     Sequence seq = ...
     Set<FeatureMapping> = seq.filter(new FeatureFilter.ByCard(aluJo));

(Hopefully you could also apply that query to a SequenceDB or other
container, like you can with FeatureFilters in the current trunk
of BioJava 1.x).


The attraction of this scheme is that is removes the concept
of feature `types' as opaque strings, and allows you to do
more meaningful things.  I guess that, at least for some uses,
we'd probably still want to keep stringy type properties in the
system for the benefit of Genbank/GFF/whatever dumpers, but
this could potentially be relegated to being a normal tag-value
type property stored in with the rest of the data.

This proposal generates loads of issues.  For instance:

   - Multiple vs. single inheritancei? (my gut says multiple).

   - Do properties automatically get inherited from more
     to less general FeatureCards?

   - How does this play with feature hierarchy? (instinctively,
     I don't think this will be a problem.  The general `transcript'
     card contains the general `exon' card, just as the card for
     one specific transcript contains the cards for several specific
     exons).

Issues aside, does anyone like this idea?  Or think it's
completely stupid?

      Thomas.