[GMOD-devel] Re: [Open-bio-l] Schema for genes & features & mappings to assemblies

Lincoln Stein lstein@cshl.org
Tue, 23 Apr 2002 11:20:35 -0400


On Tuesday 23 April 2002 07:07, Elia Stupka wrote:
> > Do you really want to special-case gene structures?  I thought
>
> Hmm... I agree with you, I like that, I guess then what we need to work on
> is the clever code that would drive it. Coincidentally we are just
> discussing super-non-hierarchical features for our comparative analysis
> db, so we might end up coding this, if we want it all to run outside
> ensembl on the bioperl-pipeline.
>
> Elia

The way I took with Bio::DB::GFF is the following:

	- all features are stored as tag/values in a single table (normalized for
		tag names)

	- a series of "aggregator" classes are responsible for taking certain
	sets of tags and constructing rich objects from them.  For example, the
	Bio::DB::GFF::Aggregator::transcript class looks for tags named 
	"exon", "cds", "polyA-site" and so forth and uses them to construct a
	transcript object.

	- you can create your own aggregators on the fly using an aggregatorFactory,
	or use "static" aggregators stored in .pm files.

I think this is similar to Jason's recent Builder interface.  The strategy 
has pluses and minuses.  The plus is that you don't have to futz with the 
schema every time you want to add a new component to your gene.  The minus is 
that it's easy for the database to drift -- no referential integrity.  
There's also a whiff of the AceDB "magic tag" syndrome here.

Lincoln