[Open-bio-l] Schema for genes & features & mappings to assemblies

Thomas Down td2@sanger.ac.uk
Tue, 23 Apr 2002 11:01:08 +0100


On Tue, Apr 23, 2002 at 10:09:09AM +0100, Ewan Birney wrote:
>
> We do need to discuss assemblies. I vote for "flat" one level assemblies
> (set of contigs form a chromosome), ala Ensembl, as I believe that the
> assummed heirarichal nature of assemblies is (a) mainly a consequence of
> how it is put together and the intermedaites in the heirarchies between
> contigs of DNA and chromosomes are nearly never stable (b) means you
> always have to use software to do conversions and can never do it easily
> with SQL (PL/SQL probably can...).

I think that's actually the crux of the assembly debate.  If you
pick a schema which supports multi-level assemblies, nobody is
actually forcing you to /use/ that capability.  If you have a 
naturally one-level assembly, you can stick to that.

However, if you're keen to put as much of your assembly-munging
as possible in SQL, that really forces a `fixed-number-of-levels'
assembly, like Ensembl's denormalized two-level system.  My thinking
on this is coloured by the fact that I've personally always worked
with code which does assembly in memory (in BioJava, we're
keen to keep the feature projection code quite separate
from any specific database technology -- we hate fixing off-by-one
errors :-).

Does Ensembl get any big performance boosts from using in-database
assembly?

    Thomas.