[Open-bio-l] Schema for genes & features & mappings to assemblies

Thomas Down td2@sanger.ac.uk
Tue, 23 Apr 2002 10:54:23 +0100


On Tue, Apr 23, 2002 at 05:24:09PM +0800, Elia Stupka wrote:
> 
> > We do need to discuss assemblies. I vote for "flat" one level assemblies
> 
> I guess the other bit missing from biosql at the moment is gene
> structures, to really start thinking of being able to do things only with
> biosql.

Do you really want to special-case gene structures?  I thought
that the `idea' of BioSQL was to put everything into a single
feature table, using tag-value fields for all the non-code bits
of data on each feature, and an ontology to hold the whole lot
together.

Remember -- we have hierarchical features.  Isn't that enough
to do gene structures?  Once you start adding gene/exon/transcript
/etc. tables, then you end up with...  Ensembl!

> >   (b) zero level (Lincoln likes this). The schema stores contigs as
> > "features" on DNA Sequences which are chromosome length.
> 
> But with zero level reverse-engineering is hard, if you want to, for
> example, do a local update, right?
> 
> I think zero level is suitable for what comes later, data mining, which is
> what we are planning to do for our multi-genome data-mining
> pipeline. Because by that stage you really cannot care less why the
> coordinates are what they are, you just want to use them (a la
> ensembl-lite)

One thing to remeber with zero-level type arrangements: you're
potentially going to want to store whole chromosome sequences.
A lot of databases will not be happy about this, especially if you
then want to go back and efficiently pull out a small region
from the middle of chromosome 1.

One solution would be to have a new sequence-storage type in BioSQL
(an alternative to the existing biosequence table), which stores
the sequence in "shredded" (small chunks) form.  This is different
from assemblies, in that the use of shredded sequence behind
the scenes should be completely hidden from the user.  I remember
talking to someone (Lincoln, I think) about this at Cape Town.


    Thomas.