[Open-bio-l] Schema for genes & features & mappings to assemblies

Chris Mungall cjm@bdgp.lbl.gov
Mon, 22 Apr 2002 15:12:09 -0700 (PDT)


On Mon, 22 Apr 2002, Hilmar Lapp wrote:

> First off, apologies to all who get this twice or multiple times.
>
> We're setting out to build a database to integrate genes, features,
> mappings between genes, mappings between genes and features, and
> mappings of features and genes to genome assemblies, all from different
> sources, for browsing, visualization, and fast queries (NOT for
> editing).
>
> I'd like to steal as much as possible from what's out there already,
> which is why I'm writing (surprise surprise).
>
> I'm having some difficulties finding the right pointers to schemas
> underpinning the genome databases, but I'm sure I just haven't been
> looking enough. Any piece of information you can give me would be
> greatly appreciated.
>
> What I understand so far (correct me where I'm wrong):
>
> 	- biosql is for sequences and features, not mappings to assemblies
> (is that intended to be added, too, or is it beyond its scope? )
> 	- GGB is running off the schema in Bio::DB:GFF, which is not
> biosql compatible (Lincoln? If so, do you have any plans to change
> that?)
> 	- Apollo doesn't run off any particular schema (is that true?),
> but rather pulls data through adaptors/APIs
> 	- Wormbase is an AceDB
> 	- the Ensembl schema excels at modeling contigs and assemblies
> (but is probably 5x more than what I want; is there a piece one could
> prune that encompasses what I'm after?)
>
> What about Flybase? Chris M. would that be you? Someone told me since
> he's leading biosql too you could just use that and get the essence of
> flybase along with it.

FlyBase uses GadFly for its annotation database; you can find more here:
www.fruitfly.org/developers.

biosql has a some of gadfly's essence sprinkled onto it but there's still
various differences.

biosql has adaptors for the bioperl objects whereas gadfly has adaptors to
it's own (not quite bioperl compliant) perl object model

> While waiting for responses (hoping that there are going to be some :) I
> thought I reverse engineer ERDs from the DDL I find in biosql and
> Bio::DB::GFF, because I hate trying to understand a schema based on
> CREATE TABLE statements. Let me know if that's already been done and I
> just overlooked the respective URLs. Also, we'll eventually implement
> this database in Oracle, and my understanding is that none of the things
> I mentioned is in or has been ported to Oracle (the latter may be,
> better yet, hopefully is, wrong).

there really needs to be some kind of website for biosql set up. I hope
folks aren't waiting on me to do this, i'm rubbish at that sort of thing.

in the biosql repository, in docs/biosql-schema.html, there is an
automatically generated html doc of the schema, with all foreign keys
bidirectionally hyperlinked. not quite an ER diagram, but it's better than
a manually generated diagram that gets out of sync with the schema.

the postgres schema gets automatically generated from the mysql one, which
is regarded as the source one. it should be easy to generate oracle too.

Right now the script for doing the conversion is pretty hacky - I'm just
about to commit a replacement.

there are also some postgres specific extensions. Some of these such as
views should work on oracle but the pg functions will need porting if you
decide to use them.

I promise to write some more docs on biosql this week - if someone could
get the basic framework for a website for it set up on open-bio that'd be
great

> 	-hilmar
>