[BioSQL-l] Seqfeature_Source

Thomas Down td2@sanger.ac.uk
Wed, 18 Sep 2002 11:14:05 +0100


On Thu, Sep 12, 2002 at 12:52:56PM -0700, Hilmar Lapp wrote:
> I'm not sure what was the idea behind this single-column table. 
> Unless I'm missing something, I propose to collapse this table into 
> a column on Seqfeature for simplicity reasons (makes both the schema 
> and adaptor code simpler, and can only improve performance).

I know this may be a bit late to reply -- I'm afraid your original
message got lost in the open-bio network outage.

Could I put in an argument against this change.  The original
BioSQL design was `normalized everything', and I think it would
be a shame to move away from this.  In particular, if you make
the seqfeature.source column a char() or varchar(), you're left
with questions like what character limit to use (255 seems to be
the BioSQL standard, IIRC).

I'm also not 100% convinced that this /will/ make things faster.
RDBMS in general really like looking things up by indexed integer
columns.  Might be something worth benchmarking.

Couple of note:

   - I'm not hard nosed about this, and am open to persuasion
     (especially if it really does make things run a lot faster).

   - Something which I proposed at Cape Town, but was dropped at
     the time, was to re-use the ontology_term mechanism for
     sources, since

         a) ontology_term is used in all the other places BioSQL
            wants controlled vocabulary stuff.

         b) in the future, I can see people who run big analysis
            pipelines with large numbers of analyses creating ontologies
            to define all the different source fields in their
            database.



Thomas.