[BioSQL-l] Seqfeature_Source
Hilmar Lapp
hlapp@gnf.org
Fri, 27 Sep 2002 11:34:34 -0700
I've now got it working in the following way:
- gene_name is an ontology term, its values go into
bioentry_qualifier_value
- seqfeature_source values go into table ontology_term, with the
FK seqfeature_source_id on seqfeature now pointing to ontology_term
(the other FK ontology_term_id gives the seqfeature key [==
primary_tag in bioperl]).
So, seqfeature_source remains normalized, but is treated as an
ontology term.
Thomas, would this be OK with you? If anyone else doesn't like this,
please shout now.
-hilmar
On Monday, September 23, 2002, at 08:32 AM, Thomas Down wrote:
> On Mon, Sep 23, 2002 at 12:42:25AM -0700, Hilmar Lapp wrote:
>>>
>>> How do you plan to do this? I can think of three possibilities:
>>>
>>> - Have a standard tag for seqfeature_source, and then put
>>> the source value (as a string) in the current
>>> seqfeature_qualifier_value table. I don't have any particular
>>> objections to this, but it's got the same problem as putting
>>> the source as a text attribute in the main seqfeature
>>> table: it leaves the source as an opaque string.
>>
>> Why is the string in seqfeature_source so different from this?
>
> It's normalized. Multiple features can point to the same
> record in seqfeature_source. Potentially, additional
> information could be joined onto the seqfeature_source table
> without having to replicate it for every feature with a given
> source.
>
> Changing this isn't necessarily /wrong/. But it does feel
> like a (small) step backwards to me. Especially since
> the other baseline feature property (from a Biojava perspective)
> the `type' is normalized (originally in seqfeature_key, now moved
> to ontology_term).
>
>> The problem is not confined to seqfeature_sources. Think of
>> gene_name annotations for instance. Gene_name goes as ontology_term,
>> but the interesting stuff ends up as a qualifier value in the
>> bioentry/ontology_term association table. Not only is the value a
>> LOB which is not indexable straightforwardly, it also will occur
>> multiple times if it is associated with more than one bioentry
>> (which it in many cases will), and hence obtaining a non-redundant
>> list of gene names is non-trivial. The present solution may look
>> simple, but it's a bad solution. Gene names should go into the
>> ontology_term table instead.
>>
>> If seqfeature_source should sit in its own table, so should
>> gene_name. And over time, we'll encounter other things that should
>> as well.
>
> I quite agree with this. Except /please/ don't call it gene_name.
> But I think there are some fairly good arguments for having a
> seqfeature_name (or similar) table. Of course, adding this has
> other issues. It seems to be a many-to-many relationship. There's
> also namespacing issues (which might be solved by strongly encouraging
> the use of LSIDs).
>
> <change_of_subject />
>
> Thinking a bit more generally about your changes to BioSQL, and issues
> you discussed at BOSC, I've noticed some overlap with the ways we're
> talking about handling annotated sequence in BioJava2. The basic plan
> is to separate features (which might be genes, or other objects) from
> their mappings onto sequences. All the type information, and most
> (all)
> of the key-value stuff (which will hopefully be more strongly
> constrained
> by the type system) goes onto the FeatureCard, while the FeatureMapping
> stays very simple. It allows you to build a system which gives
> equal weight to `gene-centric' and `sequence-centric' views of
> your annotation (unlike BioJava1, which turns out very strongly
> sequence-centric).
>
> I don't know if there's any enthusiasm at all for building this
> kind of pattern into the next generation of BioSQL. But you
> might be interested to look over the FeatureCard/FeatureMapping
> discussions on the biojava-dev list. At some point in the
> (hopefully fairly near) future, I'll write a summary of this.
>
> Thomas
>
>
>
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------