[Open-bio-l] Schema for genes & features & mappings to assemblies

Hilmar Lapp hlapp@gnf.org
Mon, 22 Apr 2002 14:36:18 -0700


First off, apologies to all who get this twice or multiple times.

We're setting out to build a database to integrate genes, features, mappings between genes, mappings between genes and features, and mappings of features and genes to genome assemblies, all from different sources, for browsing, visualization, and fast queries (NOT for editing).

I'd like to steal as much as possible from what's out there already, which is why I'm writing (surprise surprise).

I'm having some difficulties finding the right pointers to schemas underpinning the genome databases, but I'm sure I just haven't been looking enough. Any piece of information you can give me would be greatly appreciated.

What I understand so far (correct me where I'm wrong):

	- biosql is for sequences and features, not mappings to assemblies (is that intended to be added, too, or is it beyond its scope? )
	- GGB is running off the schema in Bio::DB:GFF, which is not biosql compatible (Lincoln? If so, do you have any plans to change that?)
	- Apollo doesn't run off any particular schema (is that true?), but rather pulls data through adaptors/APIs
	- Wormbase is an AceDB
	- the Ensembl schema excels at modeling contigs and assemblies (but is probably 5x more than what I want; is there a piece one could prune that encompasses what I'm after?)

What about Flybase? Chris M. would that be you? Someone told me since he's leading biosql too you could just use that and get the essence of flybase along with it.

While waiting for responses (hoping that there are going to be some :) I thought I reverse engineer ERDs from the DDL I find in biosql and Bio::DB::GFF, because I hate trying to understand a schema based on CREATE TABLE statements. Let me know if that's already been done and I just overlooked the respective URLs. Also, we'll eventually implement this database in Oracle, and my understanding is that none of the things I mentioned is in or has been ported to Oracle (the latter may be, better yet, hopefully is, wrong).

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp@gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------