[Open-bio-l] Schema for genes & features & mappings to assemblies

Ewan Birney birney@ebi.ac.uk
Tue, 23 Apr 2002 09:09:35 +0100 (BST)


On Mon, 22 Apr 2002, Thomas Down wrote:

> On Mon, Apr 22, 2002 at 02:36:18PM -0700, Hilmar Lapp wrote:
> > 
> > 	- biosql is for sequences and features, not mappings to 
> >  assemblies (is that intended to be added, too, or is it beyond its scope? )
> 
> Hi...
> 
> There was a bit of discussion of assemblies at Cape Town,
> but I don't think it was terribly conclusive.
> 
> I did actually write a little BioSQL schema extension
> for supporting assemblies, and prototyped some support for
> it in the BioJava-BioSQL code.  So it's certainly possible.
> The particular approach I took wasn't universally well
> received, though, since I designed it to support nested
> assemblies (i.e. a fragment of an assembly can itself be
> an assembly).  That's nice in that you can actually model
> the whole assembly process (right down to the individual
> sequencing reads, if you feel like it). But does mean
> that all the `assembly-munging' code will probably need
> to be put in your object layer, rather than handling
> the assembly directly in the SQL queries.
> 
> If you think BioSQL + assemblies might fit your requirements,
> it could be worth re-starting this debate.

Yup - I agree on thomas view and I think it would be fine to put in
assemblies into BioSQL.


Ensembl's sweet spot is assemblies+automatic pipeline (ala Ensembl). Most
people get put off by how much "stuff" there is inside Ensembl but infact
the schema is pretty simple and the complexity is mainly because it has
been an active project for 3 years with numerous silly deadlines drop on
it from above. Finally people get the heebee-jeebies because Ensembl is
such a big group with lots of internal drivers that people get worrried
that they don't get any say and just get swept along.



The big benefits are (a) schema and data which can be downloaded for
human, mouse, zebra, fugu, (and soon... anopheles) which is guarenteed to
work (b) very functional web site which is portable (c) ability to run
automatic systems which scale into a "please completely annotate this
genome in 2 weeks" scale



BioSQL's advantages is that it is more project neutral and the feature
ontology stuff is better worked out whereas Ensembl is deliberately
"flat".




> 
>     Thomas.
> _______________________________________________
> Open-Bio-l mailing list
> Open-Bio-l@open-bio.org
> http://open-bio.org/mailman/listinfo/open-bio-l
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------