[Biojava-dev] Feature interface change

Matthew Pocock matthew_pocock@yahoo.co.uk
Thu, 22 Aug 2002 10:42:10 +0100


Hi.

This is part of an ongoing discussion Thomas and I have had. These 
changes to features are not slated for any possible 1.3 release. They 
may perhaps form part of BioJava 2.

The idea:

Entities like genes or repeat types are realy first class objects. You 
could have all sorts of information attached to a Gene - phenotypes, 
diseases, all that biological stuff. You can have hierachies or 
ontologies of these terms. They realy exist independantly of any 
'material view' on a bit of DNA sequence.

Features on sequences live within some sequence/feature space as 
modelled variously by bio{perl,python,sql,java,corba} - some sort of 
coordinate. In this world, a single gene may be represented by multiple 
features, one for each coordinate system it is found in - chromosome, 
clone, embl file etc..

It would be nice if we could model the semanticaly rich descriptive 
object that is shared as a single entity, bound into multiple sequence 
contexts.

The current model:

Sequence isa FeatureHolder

FeatureHolders isa Set<Feature>

Feature isa FeatureHolder
         hasa Location

So, features are located via a Sequene,Location pair.

The new scheim would be something like:

Sequence isa FeatureHolder

FeatureHolders isa Set<Feature>

Feature isa FeatureHolder
         hasa Location
         hasa FeatureCard

FeatureCard hasa Set<Feature>

In this case, the gene, exon, repeat object is the FeatureCard. All the 
info specific to that type of biological feature goes into the 
FeatureCard. The Feature object is all info about how it is anchored to 
a specific region of the genome. So, where as now we have methods like: 
getTranslation() on Feature, these would move to the FeatureCard. The 
getStrand() method would stay on the Feature object as that is specific 
to where it is bound into a bit of sequence.

This way, when feature information is projected into different 
coordinate systems (via assemblies or DAS or whatever), the exact same 
FeatureCard instance can be returned, and when you parse an Embl record 
or look up what's on a micro-array spot, the same FeatureCard instance 
could be reused. The names are bad, but that is easily improved.

Any thoughts anyone?

Matthew

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com