[BioSQL-l] Seqfeature_Source

Hilmar Lapp hlapp@gnf.org
Mon, 23 Sep 2002 00:42:25 -0700


On Friday, September 20, 2002, at 05:50 AM, Thomas Down wrote:

> On Thu, Sep 19, 2002 at 04:17:40PM -0700, Hilmar Lapp wrote:
>>
>> Of course <eye opener> -- I overlooked that the association already
>> exists. So what about putting seqfeature_source there as well (i.e.,
>> as a qualifier?)
>
> How do you plan to do this?  I can think of three possibilities:
>
>   - Have a standard tag for seqfeature_source, and then put
>     the source value (as a string) in the current
>     seqfeature_qualifier_value table.  I don't have any particular
>     objections to this, but it's got the same problem as putting
>     the source as a text attribute in the main seqfeature
>     table: it leaves the source as an opaque string.

Why is the string in seqfeature_source so different from this?

Maybe I'm missing something.

>
>   - Add a second qualifier_value table, for associations
>     in which the value part is a second ontology_term.  This
>     might have uses beyond storing the source.
>
>   - Just tag each seqfeature with a particular ontology_term which
>     defines its source.  This could go in seqfeature_qualifier_value
>     will a NULL value.  Quite elegant, but really requires some
>     `proper' ontology support (rather than just the current stuff,
>     which is just controlled vocab).  At least enough to be
>     able to specify which ontology_terms are valid as sources.
>
> All of these are potentially workable.  None are simpler than the
> current setup, though :-(.
>

The problem is not confined to seqfeature_sources. Think of 
gene_name annotations for instance. Gene_name goes as ontology_term, 
but the interesting stuff ends up as a qualifier value in the 
bioentry/ontology_term association table. Not only is the value a 
LOB which is not indexable straightforwardly, it also will occur 
multiple times if it is associated with more than one bioentry 
(which it in many cases will), and hence obtaining a non-redundant 
list of gene names is non-trivial. The present solution may look 
simple, but it's a bad solution. Gene names should go into the 
ontology_term table instead.

If seqfeature_source should sit in its own table, so should 
gene_name. And over time, we'll encounter other things that should 
as well.

I'm not saying this is bad by definition, but I am saying this 
defeats the generic ontology-based design of biosql.

	-hilmar
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------