[BioSQL-l] Seqfeature_Source
Thomas Down
td2@sanger.ac.uk
Wed, 18 Sep 2002 11:14:05 +0100
On Thu, Sep 12, 2002 at 12:52:56PM -0700, Hilmar Lapp wrote:
> I'm not sure what was the idea behind this single-column table.
> Unless I'm missing something, I propose to collapse this table into
> a column on Seqfeature for simplicity reasons (makes both the schema
> and adaptor code simpler, and can only improve performance).
I know this may be a bit late to reply -- I'm afraid your original
message got lost in the open-bio network outage.
Could I put in an argument against this change. The original
BioSQL design was `normalized everything', and I think it would
be a shame to move away from this. In particular, if you make
the seqfeature.source column a char() or varchar(), you're left
with questions like what character limit to use (255 seems to be
the BioSQL standard, IIRC).
I'm also not 100% convinced that this /will/ make things faster.
RDBMS in general really like looking things up by indexed integer
columns. Might be something worth benchmarking.
Couple of note:
- I'm not hard nosed about this, and am open to persuasion
(especially if it really does make things run a lot faster).
- Something which I proposed at Cape Town, but was dropped at
the time, was to re-use the ontology_term mechanism for
sources, since
a) ontology_term is used in all the other places BioSQL
wants controlled vocabulary stuff.
b) in the future, I can see people who run big analysis
pipelines with large numbers of analyses creating ontologies
to define all the different source fields in their
database.
Thomas.