[Biojava-dev] Feature interface change

Thomas Down td2@sanger.ac.uk
Thu, 22 Aug 2002 14:25:08 +0100


On Thu, Aug 22, 2002 at 10:42:10AM +0100, Matthew Pocock wrote:
> 
> This is part of an ongoing discussion Thomas and I have had. These 
> changes to features are not slated for any possible 1.3 release. They 
> may perhaps form part of BioJava 2.

Since BioJava 2 discussions seem to be getting slightly more
common and slightly more serious, we should maybe have a `wishlist'
page on the website for archiving suggestions and related discussion
(whether sublime or ridiculous).

Anyway, I like the split FeatureCard/FeatureMapping model (can
we use something like FeatureMapping, at least as a working title,
to avoid confusion with the old model).  A few thoughts follow...

> The current model:
> 
> Sequence isa FeatureHolder
> 
> FeatureHolders isa Set<Feature>
> 
> Feature isa FeatureHolder
>         hasa Location
> 
> So, features are located via a Sequene,Location pair.
> 
> The new scheim would be something like:
> 
> Sequence isa FeatureHolder
> 
> FeatureHolders isa Set<Feature>
> 
> Feature isa FeatureHolder
>         hasa Location
>         hasa FeatureCard
> 
> FeatureCard hasa Set<Feature>

Hmmm, as you draw things out there, the relationships between Features
live on FeatureMappings rather than FeatureCards.  I'm not going to
say outright that this is wrong (at least for now), but I think we should
at least consider putting the relationships on the FeatureCards instead.
Thinking about the most common current use case we have for hierarchical
features at the moment, (exon child-of transcript), I can see both
exons and transcripts having stable FeatureCards with mappings onto
multiple genome assemblies (Ensembl today already has stable IDs for
both exons and transcripts.  This works).  In this case, at least,
it seems to me that it makes most sense to put the relationships on
the FeatureCards.

Since we're talking about revolution rather than evolution here , I wonder 
if it might not also be time to reconister the 1-to-many relationship
we currently have between parent and child features.  As a first step:


    public interface FeatureCard extends BJ2Identifiable {
        public FeatureType getType();
        public Set<FeatureMapping> getFeatureMappings();
        public Set<FeatureCard> getChildren();
        public Set<FeatureCard> getParents();      
    }

A more extreme option would be to generalize things completely and
just have `relationships' between features, and an (extensible)
voccabulary of relationship types.  This is done in the post-cape-town
BioSQL schema, although I don't know of anything which really takes
advantage of it yet.  It's interesting, but we should probably look
for use-cases before deciding.

> In this case, the gene, exon, repeat object is the FeatureCard. All the 
> info specific to that type of biological feature goes into the 
> FeatureCard. The Feature object is all info about how it is anchored to 
> a specific region of the genome. So, where as now we have methods like: 
> getTranslation() on Feature, these would move to the FeatureCard. The 
> getStrand() method would stay on the Feature object as that is specific 
> to where it is bound into a bit of sequence.

This is interesting, since it makes the FeatureMapping look
rather like Location objects in some other projects/models.


     Thomas.