[BioSQL-l] Seqfeature_Source

Hilmar Lapp hlapp@gnf.org
Fri, 27 Sep 2002 11:34:34 -0700


I've now got it working in the following way:

	- gene_name is an ontology term, its values go into 
bioentry_qualifier_value

	- seqfeature_source values go into table ontology_term, with the 
FK seqfeature_source_id on seqfeature now pointing to ontology_term 
(the other FK ontology_term_id gives the seqfeature key [== 
primary_tag in bioperl]).

So, seqfeature_source remains normalized, but is treated as an 
ontology term.

Thomas, would this be OK with you? If anyone else doesn't like this, 
please shout now.

	-hilmar

On Monday, September 23, 2002, at 08:32 AM, Thomas Down wrote:

> On Mon, Sep 23, 2002 at 12:42:25AM -0700, Hilmar Lapp wrote:
>>>
>>> How do you plan to do this?  I can think of three possibilities:
>>>
>>>  - Have a standard tag for seqfeature_source, and then put
>>>    the source value (as a string) in the current
>>>    seqfeature_qualifier_value table.  I don't have any particular
>>>    objections to this, but it's got the same problem as putting
>>>    the source as a text attribute in the main seqfeature
>>>    table: it leaves the source as an opaque string.
>>
>> Why is the string in seqfeature_source so different from this?
>
> It's normalized.  Multiple features can point to the same
> record in seqfeature_source.  Potentially, additional
> information could be joined onto the seqfeature_source table
> without having to replicate it for every feature with a given
> source.
>
> Changing this isn't necessarily /wrong/.  But it does feel
> like a (small) step backwards to me.  Especially since
> the other baseline feature property (from a Biojava perspective)
> the `type' is normalized (originally in seqfeature_key, now moved
> to ontology_term).
>
>> The problem is not confined to seqfeature_sources. Think of
>> gene_name annotations for instance. Gene_name goes as ontology_term,
>> but the interesting stuff ends up as a qualifier value in the
>> bioentry/ontology_term association table. Not only is the value a
>> LOB which is not indexable straightforwardly, it also will occur
>> multiple times if it is associated with more than one bioentry
>> (which it in many cases will), and hence obtaining a non-redundant
>> list of gene names is non-trivial. The present solution may look
>> simple, but it's a bad solution. Gene names should go into the
>> ontology_term table instead.
>>
>> If seqfeature_source should sit in its own table, so should
>> gene_name. And over time, we'll encounter other things that should
>> as well.
>
> I quite agree with this.  Except /please/ don't call it gene_name.
> But I think there are some fairly good arguments for having a
> seqfeature_name (or similar) table.  Of course, adding this has
> other issues.  It seems to be a many-to-many relationship.  There's
> also namespacing issues (which might be solved by strongly encouraging
> the use of LSIDs).
>
> <change_of_subject />
>
> Thinking a bit more generally about your changes to BioSQL, and issues
> you discussed at BOSC, I've noticed some overlap with the ways we're
> talking about handling annotated sequence in BioJava2.  The basic plan
> is to separate features (which might be genes, or other objects) from
> their mappings onto sequences.  All the type information, and most 
> (all)
> of the key-value stuff (which will hopefully be more strongly 
> constrained
> by the type system) goes onto the FeatureCard, while the FeatureMapping
> stays very simple.  It allows you to build a system which gives
> equal weight to `gene-centric' and `sequence-centric' views of
> your annotation (unlike BioJava1, which turns out very strongly
> sequence-centric).
>
> I don't know if there's any enthusiasm at all for building this
> kind of pattern into the next generation of BioSQL.  But you
> might be interested to look over the FeatureCard/FeatureMapping
> discussions on the biojava-dev list.  At some point in the
> (hopefully fairly near) future, I'll write a summary of this.
>
>     Thomas
>
>
>
>
>
--
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------