[Biojava-dev] Feature interface change

Dickson, Mike mdickson@netgenics.com
Thu, 22 Aug 2002 09:44:53 -0400


I was going to hold on to this to think some more but I figured I'll assume
the audience will grant permission for me to be wrong and fire away!

We did something similar to this with the original BSA submission to OMG.
In our case we used "Annotation" as the informational interface and
"Feature" as an annotation with location.  We also supported nesting and
grouping as you are with FeatureHolder.  Overall I liked the model though it
got a bit messy trying to express it in IDL and stay away from some then
controversial features (valuetypes).

There are a couple of differences.

We actually made a feature inherit from annotation. It also included the
location as attributes instead of in a has-a relationship as you suggest.
In this case a Sequence would be an "AnnotationHolder" and some of the
annotations are features since they have location.  It looks like bsane has
stayed pretty close to this model though there's a lot more in there for
support of fuzzy and remote location, etc.

So you have the following interface hierarchy:

Sequence isa AnnotationHolder

AnnotationHolder isa Set<Annotation>

Annotation (base informational object)

Feature isa Annotation 
          hasa Location

-----

You could do grouping 2 ways..., or you could do both.  We supported both in
BSA.  Actually you could make an Annotation an AnnotationHolder but I'm not
sure I see a case for sub-grouping in an annotation.  I'd have to get out
the doc and rethink the use cases to decide how I feel about the grouping
constructs now...

FeatureHolder isa Set<Feature>

Feature isa FeatureHolder

OR

LocationHolder isa Set<Location>

Location isa LocationHolder

Everything I'm laying out here is interface inheritance.  You could be
pretty flexible in an implementation with delegation of the interface to
another object (the common XxxxSupport pattern in java).

Ultimately I agree with you that things like a Gene and so on are first
class objects.  The downside to the hierarchy above is that with this mixin
approach you get a potential problem with implementations.  Its not so bad n
a language that can do multiple implementation inheritance (I won't even
comment on whether this is good or bad) but in Java it's problematic.  For
example Gene inherits Annotation but you want a Gene that also is a Feature
so you need another implementation that adds that, i.e. GeneFeature.  You
can get around this with composition (Feature hasa annotation) but then you
lose the generic AnotationHolder behaviour. If you're willing to give up on
that (maybe sequence isa AnnotationHolder and FeatureHolder) the you can
break the inheritance and do composition instead.  

Obviously just a bunch of random thoughts but its been hashed out before and
if we're to revisit the model in a more major way I'd like to see us factor
some of the other work in.

Mike
 
> -----Original Message-----
> From: Matthew Pocock [mailto:matthew_pocock@yahoo.co.uk]
> Sent: Thursday, August 22, 2002 5:42 AM
> To: biojava-dev@biojava.org
> Subject: [Biojava-dev] Feature interface change
> 
> Hi.
> 
> This is part of an ongoing discussion Thomas and I have had. These
> changes to features are not slated for any possible 1.3 release. They
> may perhaps form part of BioJava 2.
> 
> The idea:
> 
> Entities like genes or repeat types are realy first class objects. You
> could have all sorts of information attached to a Gene - phenotypes,
> diseases, all that biological stuff. You can have hierachies or
> ontologies of these terms. They realy exist independantly of any
> 'material view' on a bit of DNA sequence.
> 
> Features on sequences live within some sequence/feature space as
> modelled variously by bio{perl,python,sql,java,corba} - some sort of
> coordinate. In this world, a single gene may be represented by multiple
> features, one for each coordinate system it is found in - chromosome,
> clone, embl file etc..
> 
> It would be nice if we could model the semanticaly rich descriptive
> object that is shared as a single entity, bound into multiple sequence
> contexts.
> 
> The current model:
> 
> Sequence isa FeatureHolder
> 
> FeatureHolders isa Set<Feature>
> 
> Feature isa FeatureHolder
>          hasa Location
> 
> So, features are located via a Sequene,Location pair.
> 
> The new scheim would be something like:
> 
> Sequence isa FeatureHolder
> 
> FeatureHolders isa Set<Feature>
> 
> Feature isa FeatureHolder
>          hasa Location
>          hasa FeatureCard
> 
> FeatureCard hasa Set<Feature>
> 
> In this case, the gene, exon, repeat object is the FeatureCard. All the
> info specific to that type of biological feature goes into the
> FeatureCard. The Feature object is all info about how it is anchored to
> a specific region of the genome. So, where as now we have methods like:
> getTranslation() on Feature, these would move to the FeatureCard. The
> getStrand() method would stay on the Feature object as that is specific
> to where it is bound into a bit of sequence.
> 
> This way, when feature information is projected into different
> coordinate systems (via assemblies or DAS or whatever), the exact same
> FeatureCard instance can be returned, and when you parse an Embl record
> or look up what's on a micro-array spot, the same FeatureCard instance
> could be reused. The names are bad, but that is easily improved.
> 
> Any thoughts anyone?
> 
> Matthew
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
> 
> _______________________________________________
> biojava-dev mailing list
> biojava-dev@biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev