[BioSQL-l] BioSQL and ontology "standards".

Peter biopython at maubp.freeserve.co.uk
Thu Dec 4 15:04:44 UTC 2008


On Thu, Dec 4, 2008 at 1:25 PM, Leighton Pritchard <lpritc at scri.ac.uk> wrote:
> With apologies if I'm misinterpreting the tide of discussion, but I would be
> disappointed to see a default behaviour of "bung everything under
> 'Annotation Tags', typos and all" become a 'standard' of any sort, rather
> than a placeholder for future development of ontology-aware Bio* code that
> queries and populates BioSQL appropriately.

Overall, I agree.  It isn't ideal, but the current ad-hoc "ontology"
is useful in that its looseness allows any parsable GenBank file to be
imported into the database.  Pinning down the current behaviour as a
"standard" for better intercompatibility between the Bio* projects is
a good thing, even if this only a short term goal.  In the long term,
yes, maybe all Bio* projects should be able to cope with any
(optional) strict ontology instead.

> I see the situation as pretty much analogous to the effective requirement
> for NCBI taxon data in BioSQL, when using Biopython: you need to load in the
> NCBI taxon data before your own data can be imported in a taxon-aware
> manner.

This is going off topic, but that's not really true any more.

It used to be the case that if you wanted to record the NCBI taxonomy
when loading GenBank files into BioSQL with Biopython that you would
ideally first prepopulate the taxonomy tables with the BioSQL
load_ncbi_taxonomy.pl script.

I should go and update http://www.biopython.org/wiki/BioSQL now that
Biopython 1.49 is out, as it can optionally fill in the lineage on
demand by querying NCBI Entrez.  Either way, it does "play nice" with
running load_ncbi_taxonomy.pl before or after loading records with
Biopython.

Peter



More information about the BioSQL-l mailing list