[Open-bio-l] Re: [BioSQL-l] Ontology names

Thomas Down td2@sanger.ac.uk
Mon, 30 Sep 2002 10:45:43 +0100


On Fri, Sep 27, 2002 at 11:52:56AM -0700, Hilmar Lapp wrote:
>
> Ontology names will likely (but are not required to) have NULL in 
> category_id.
> 
> Is everyone OK with this so far?
> 
> In order to get things out by a Bio* package other than the one that 
> put it in, we need to agree on ontology names in the first place 
> (but also on terms).
> 
> I am right now using the following ontology names:
> 
> - 'Annotation Tags': the keys (tags, qualifier names) for simple 
> annotation values (qualifier values)
> - 'SeqFeature Keys': the keys of seqfeatures ($feat->primary_tag() 
> slot in bioperl; e.g., the genbank feature key, or swissprot feature 
> key, like 'CDS', 'mRNA', ...)
> - 'SeqFeature Sources': the source names of seqfeatures 
> ($feat->source_tag() slot in bioperl; like 'swissprot', 'genscan', 
> etc).
> 
> There is already a pre-defined number of terms for location 
> properties (min_start, etc), but without an ontology. I'd like to 
> put them into an ontology and suggest the name 'Location Tags' for 
> it.

Sorry to reply a bit late to this thread -- I've been having
a few problems with e-mails to and from these mailing lists
(probably DNS-related, and seem to be sorted out now).

Anyway, to me this all feels like it's trying to mix together
several different concepts.  Many (though by no means all)
ontology_terms are really defining properties of objects.
The keys used in seqfeature_qualifier_value are a very good
example of this.  Similarly the location qualifiers.

Looking specifically at properties, they can be defined by:

  - Their domain -- the class (or classes) of object to which
    they apply.

  - Their range -- the set of values which are allowed.

  - Their cardinality -- e.g, 0..1, exactly 1, 0..infinity

The domain might just be `seqfeature' or `seqfeature_location'.
But the interesting cases come when you set more restrictive
domains (say, "A feature of type SNP must have one or more
variants").  A more mundane application might be to define
the required set of qualifiers for a given feature type in
an EMBL feature table./

We're now taking ontology_terms somewhat beyond being a simple
controlled vocabulary, and into schema-land.  I don't know what
people's feelings are on this.  My understanding is that the
original plan with ontology_term was to leave it totally opaque,
then join on some extra tables which included relationship/schema
information.

As I understand it (please correct me if I've got the wrong
end of this), the `category' concept seems to be trying to
mix up aspects of property domains (for ontology_terms which
define names of properties) and propery ranges (for terms which
are used as values -- e.g. seqfeature_key).  Is this actually
a sensible thing to do?


Hilmar: I know you're on a tight schedule with this.  If adding
a category field solves your problem, today, then go for it.
However, it might be better to put this on a separate table,
for ease of untangling stuff in the future (it also avoids having
an FK to self, although you still get a circular reference, of
course).

     Thomas.


PS. The way I've discussed properties here is very DAML-esque.
    At some point in the past, I remember a dicussion about doing
    DAML definitions for the open-bio datamodels.  Did this
    ever get off the ground?