[BioSQL-l] BioSQL and ontology "standards".

Brad Chapman chapmanb at 50mail.com
Mon Dec 15 01:17:50 UTC 2008


Hi all;
I wanted to reply to the BioSQL ontology discussion Peter started up
last month. He summarized how naming is currently done in the ontology
and term tables and some of the potential downsides to that:

> Currently BioPerl and Biopython (and I assume the other projects but
> haven't checked) use a couple of ad-hoc ontology names for storing
> annotation.  In particular, if there is no predefined entry for a
> novel ontology term, it gets added on the fly.  This is very
> convenient as it means a BioSQL database can be used without first
> importing a predefined ontology.  However there are downsides, for
> example spelling errors in the keys of a GenBank file get treated as
> a ontology entries.

There was some general consensus that a more formalized, or at least
documented, naming scheme would be good, provided there is some leniency
for adding terms if they don't fall into the scheme. I agree, and think
this suggestion by Peter is good:

> On a related point, it might make more sense to use a predefined
> ontology, like SOFA or SO from http://www.sequenceontology.org/

Towards a start for this, I put together a mapping of GenBank header,
feature and qualifier keys to the SO ontology (and also standard
ontologies like Dublin Core). If this is a direction we'd like to
go, this would provide the high level documentation for reference
implementations. It is currently about 3/4 finished but should give a
good notion; I'd need some help from someone more familiar with SO for
some of the missing terms.

It got a little out of control for a mailing list post, so I wrote
up the motivation and details here:

http://bcbio.wordpress.com/2008/12/14/standard-ontologies-in-biosql/

The tab delimited mapping file with GenBank terms to ontology terms
is there as a starting place.

Brad



More information about the BioSQL-l mailing list