[Open-bio-l] Schema for genes & features & mappings to assemblies

Ewan Birney birney@ebi.ac.uk
Tue, 23 Apr 2002 11:39:28 +0100 (BST)


On Tue, 23 Apr 2002, Thomas Down wrote:

> On Tue, Apr 23, 2002 at 05:24:09PM +0800, Elia Stupka wrote:
> > 
> > > We do need to discuss assemblies. I vote for "flat" one level assemblies
> > 
> > I guess the other bit missing from biosql at the moment is gene
> > structures, to really start thinking of being able to do things only with
> > biosql.
> 
> Do you really want to special-case gene structures?  I thought
> that the `idea' of BioSQL was to put everything into a single
> feature table, using tag-value fields for all the non-code bits
> of data on each feature, and an ontology to hold the whole lot
> together.
> 
> Remember -- we have hierarchical features.  Isn't that enough
> to do gene structures?  Once you start adding gene/exon/transcript
> /etc. tables, then you end up with...  Ensembl!

And is that such a bad thing!

> 
> > >   (b) zero level (Lincoln likes this). The schema stores contigs as
> > > "features" on DNA Sequences which are chromosome length.
> > 
> > But with zero level reverse-engineering is hard, if you want to, for
> > example, do a local update, right?
> > 
> > I think zero level is suitable for what comes later, data mining, which is
> > what we are planning to do for our multi-genome data-mining
> > pipeline. Because by that stage you really cannot care less why the
> > coordinates are what they are, you just want to use them (a la
> > ensembl-lite)
> 
> One thing to remeber with zero-level type arrangements: you're
> potentially going to want to store whole chromosome sequences.
> A lot of databases will not be happy about this, especially if you
> then want to go back and efficiently pull out a small region
> from the middle of chromosome 1.
> 
> One solution would be to have a new sequence-storage type in BioSQL
> (an alternative to the existing biosequence table), which stores
> the sequence in "shredded" (small chunks) form.  This is different
> from assemblies, in that the use of shredded sequence behind
> the scenes should be completely hidden from the user.  I remember
> talking to someone (Lincoln, I think) about this at Cape Town.
> 
> 



>     Thomas.
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------