[Open-bio-l] Schema for genes & features & mappings to assemblies

Ewan Birney birney@ebi.ac.uk
Tue, 23 Apr 2002 11:47:01 +0100 (BST)


On Tue, 23 Apr 2002, Thomas Down wrote:

> On Tue, Apr 23, 2002 at 10:09:09AM +0100, Ewan Birney wrote:
> >
> > We do need to discuss assemblies. I vote for "flat" one level assemblies
> > (set of contigs form a chromosome), ala Ensembl, as I believe that the
> > assummed heirarichal nature of assemblies is (a) mainly a consequence of
> > how it is put together and the intermedaites in the heirarchies between
> > contigs of DNA and chromosomes are nearly never stable (b) means you
> > always have to use software to do conversions and can never do it easily
> > with SQL (PL/SQL probably can...).
> 
> I think that's actually the crux of the assembly debate.  If you
> pick a schema which supports multi-level assemblies, nobody is
> actually forcing you to /use/ that capability.  If you have a 
> naturally one-level assembly, you can stick to that.

But you can't assumme people will make the same assumptions about this -
ie, to allow generic binding to *any* bioSQL database you have to go
multilevel.


> 
> However, if you're keen to put as much of your assembly-munging
> as possible in SQL, that really forces a `fixed-number-of-levels'
> assembly, like Ensembl's denormalized two-level system.  My thinking
> on this is coloured by the fact that I've personally always worked
> with code which does assembly in memory (in BioJava, we're
> keen to keep the feature projection code quite separate
> from any specific database technology -- we hate fixing off-by-one
> errors :-).
> 
> Does Ensembl get any big performance boosts from using in-database
> assembly?
> 

We percieved that we did, but didn't test it. It gives people options
about how to handle the coordinate mapping.


I still prefer 1 level as I think n levels is just asking for
obfustication and prevents people easily treating the database as "just
the data" without having any code dependancies.





>     Thomas.
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l@open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------