[BioSQL-l] Seqfeature_Source
Hilmar Lapp
hlapp@gnf.org
Mon, 23 Sep 2002 00:42:25 -0700
On Friday, September 20, 2002, at 05:50 AM, Thomas Down wrote:
> On Thu, Sep 19, 2002 at 04:17:40PM -0700, Hilmar Lapp wrote:
>>
>> Of course <eye opener> -- I overlooked that the association already
>> exists. So what about putting seqfeature_source there as well (i.e.,
>> as a qualifier?)
>
> How do you plan to do this? I can think of three possibilities:
>
> - Have a standard tag for seqfeature_source, and then put
> the source value (as a string) in the current
> seqfeature_qualifier_value table. I don't have any particular
> objections to this, but it's got the same problem as putting
> the source as a text attribute in the main seqfeature
> table: it leaves the source as an opaque string.
Why is the string in seqfeature_source so different from this?
Maybe I'm missing something.
>
> - Add a second qualifier_value table, for associations
> in which the value part is a second ontology_term. This
> might have uses beyond storing the source.
>
> - Just tag each seqfeature with a particular ontology_term which
> defines its source. This could go in seqfeature_qualifier_value
> will a NULL value. Quite elegant, but really requires some
> `proper' ontology support (rather than just the current stuff,
> which is just controlled vocab). At least enough to be
> able to specify which ontology_terms are valid as sources.
>
> All of these are potentially workable. None are simpler than the
> current setup, though :-(.
>
The problem is not confined to seqfeature_sources. Think of
gene_name annotations for instance. Gene_name goes as ontology_term,
but the interesting stuff ends up as a qualifier value in the
bioentry/ontology_term association table. Not only is the value a
LOB which is not indexable straightforwardly, it also will occur
multiple times if it is associated with more than one bioentry
(which it in many cases will), and hence obtaining a non-redundant
list of gene names is non-trivial. The present solution may look
simple, but it's a bad solution. Gene names should go into the
ontology_term table instead.
If seqfeature_source should sit in its own table, so should
gene_name. And over time, we'll encounter other things that should
as well.
I'm not saying this is bad by definition, but I am saying this
defeats the generic ontology-based design of biosql.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------