[BioSQL-l] version NOT NULL
Hilmar Lapp
hlapp at gnf.org
Wed Mar 12 11:42:13 EST 2003
I actually vote for making all version columns NOT NULL where the
version is part of a unique key constraint (that would be bioentry and
dbxref).
The reason is practical, not theoretical. On the theory side there is
no question that version should be optional.
On the practical end of things though you encounter the following two
problems if that column may be NULL. I've been down that road.
1) Different RDBMSs handle NULLs in composite alternative keys (UKs)
in a different manner. Whether one way is correct and another isn't
doesn't really matter if your RDBMS happens to do it the wrong way.
Example: assuming a UK constraint on (acc,version) you can have
('X712462',NULL) only once in Oracle, but multiple times in MySQL (at
least v3.23.53). I haven't checked Pg's behaviour in this respect. If
NULLs are not allowed, then there is no question how your RDBMS is
going to handle that.
This is BTW also the reason why made most if not all of the ranks NOT
NULL. You just end up with duplicated entries if you rely on the RDBMS
enforcing the UK (which is why you have the UK in the first place) and
then it doesn't for some reasons.
2) The SQL clauses for constraining a column on a value or on being
NULL are quite different ("table.version = ?" as opposed to
"table.version IS NULL"). If you've got an object, e.g. a seq object,
and you want to look up in the database by a unique key whether it
already exists, and the unique key is composite where one of the
components may be NULL, then you have to go through either 2 queries or
quite some acrobatics in SQL construction (where I can tell you that
the resulting SQL is not going to be fast on many RDBMSs!) to make that
look up.
Example: say there is a sequence object and the attributes by which I
can look up is the familiar triple of (namespace,accession,version). If
I do not create the statement from scratch every time I do the look up
(which, if I did that, would give away an order of magnitude
performance win on RDBMSs that do real prepared statements, like
Oracle), I either have to prepare two statements for the same purpose
(which is quite different from any other situation, making it into a
special situation requiring special code - ugly), or I end up with a
monster clause like
WHERE ...
AND bioentry.biodatabase_id = ?
AND bioentry.accession = ?
AND (
bioentry.version = ?
OR (
bioentry.version IS NULL
AND ? IS NULL
)
)
which is not only going to be slow, but also requires you to bind the
same parameter twice which is again special code - ugly.
The way I resolved this (quite nicely I think - my opinion anyway) is
to introduce an artificial version 0 and treat it as NULL in the
adaptor code. I.e., if a version attribute is undef, it gets passed as
'0' to the statement execution method. If a version column has value 0,
the object's attribute is not set (remains undef) (in perl extremely
easy since 'if $value;' will treat 0 and undef as equivalent).
What I'm basically saying is that yes, theoretically you want version
to be nullable, but in practice if you do that you're asking for
trouble in the form of much more complex code, where the trouble is
avoidable with relatively little effort on the code side of things, and
the benefit is a big simplification of query construction and query
execution code, because you rid yourself of special cases that need to
be handled.
-hilmar
On Wednesday, March 12, 2003, at 06:39 AM, Aaron J Mackey wrote:
>
> It looks like all the NOT NULL constraints on version were removed at
> some
> point in history (which is good), but I noticed there's still one on
> dbxref's version ... is this a special case, or does it need to have
> its
> NOT NULL removed as well?
>
> -Aaron
>
> --
> Aaron J Mackey
> Pearson Laboratory
> University of Virginia
> (434) 924-2821
> amackey at virginia.edu
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the BioSQL-l
mailing list