[BioSQL-l] BioSQL and ontology "standards".

James Procter jimp at compbio.dundee.ac.uk
Thu Dec 4 11:45:24 UTC 2008


Hi - I'm very sorry to break the thread a little - particularly with the
deep discussion that's going on. Peter drew my attention to the thread
in his reply to my ps. on another thread:

Peter's reply to my original PS:
>> ps. on a side issue - have the various Bio* language bindings actually
>> been specified formally ?  If so - where might I find them ?
>>
>
> I think the answer to that is sadly a no.  For Biopython work, I have
> been treating BioPerl as the reference implementation BioSQL.

Peter wrote:
> On Fri, Nov 28, 2008 at 7:16 PM, Richard Holland wrote:
>> BioJava does what BioPerl does and pretty much makes it up as it goes
>> along, using whatever the input files tell it.
> 
>  OK, good. 
<SNIP>

As a brutal summary, leaving all Peter's questions unanswered, that
statement suggests a consensus - BioPerl is the 'reference' mapping.
However, I personally do not yet know enough about each Bio* sequence
feature structure to verify that this is the case.

>> I think the best approach is to always to use what the file says, and
>> trust that it's accurate. What needs to be agreed between projects is
>> any additional annotations that get introduced outside the context of
>> file parsing, and the names of the ontologies used for the file
>> annotations so that all projects use the same ontologies and don't
>> replicate them inside the BioSQL database. It would be nice to
>> standardise these names and the additional custom terms across the
>> projects, in much the same way as people tried already to standardise
>> the way general objects get mapped to BioSQL.
> 
> This is what I am trying to get at here - documenting the existing "ad
> hoc" ontology usage.  My impression is that it has not been
> documented, and that the BioPerl behaviour is the defacto BioSQL
> standard.
> 
> I'd like to pin down this standard, and extend it for situations like
> the location_qualifier_value.term_id and perhaps location.term_id
> where BioPerl seems to ignore the ontology issue.

I'm adding my support for documentation here. However, to put into
perspective why this verification is necessary, I should explain my problem:

I've been evaluating the use of BioSQL as a back end database for DAS
source deployment. We are using both BioPerl and BioJava to interact
with the BioSQL database, but ultimately aim to serve bioentry
annotation as DAS features. This means that there needs to be a clear
between a BioSQL bioentry's annotation and the attributes of one or more
DAS features, and that mapping needs to be honoured by all the Bio*
object bindings utilised by the various programs interacting with the
BioSQL database.

DAS features are actually pretty simple. To begin with, I'm only
interested in unambiguously mapping the core DAS/1 feature attributes:
- location (start,end and strand)
- type (which may additionally have a sequence annotation ontology term)
- label (free text relating to the type term)
- feature score (again associated with the type)
- URLs (often added as href properties)
- Method (free text but often has associated evidence code)
- notes (free text which may include additional ontological terms)

I'm building on the mapping started by Benjamin Schuster Bockler and
implemented in Dazzle. However, I've already run into some mismatches
and I now need to clarify whether we are misusing the BioPerl sequence
feature binding, or whether the Biojava->DAS part of the mapping is
broken. A formal specification, or at the very least a mapping diagram,
is therefore pretty much essential. It will also enable better 'out of
the box' support for access to BioSQL datasources in other applications.

Jim.



More information about the BioSQL-l mailing list