[BioSQL-l] version NOT NULL

Hilmar Lapp hlapp at gnf.org
Wed Mar 12 11:42:13 EST 2003


I actually vote for making all version columns NOT NULL where the 
version is part of a unique key constraint (that would be bioentry and 
dbxref).

The reason is practical, not theoretical. On the theory side there is 
no question that version should be optional.

On the practical end of things though you encounter the following two 
problems if that column may be NULL. I've been down that road.

	1) Different RDBMSs handle NULLs in composite alternative keys (UKs) 
in a different manner. Whether one way is correct and another isn't 
doesn't really matter if your RDBMS happens to do it the wrong way. 
Example: assuming a UK constraint on (acc,version) you can have 
('X712462',NULL) only once in Oracle, but multiple times in MySQL (at 
least v3.23.53). I haven't checked Pg's behaviour in this respect. If 
NULLs are not allowed, then there is no question how your RDBMS is 
going to handle that.

	This is BTW also the reason why made most if not all of the ranks NOT 
NULL. You just end up with duplicated entries if you rely on the RDBMS 
enforcing the UK (which is why you have the UK in the first place) and 
then it doesn't for some reasons.

	2) The SQL clauses for constraining a column on a value or on being 
NULL are quite different ("table.version = ?" as opposed to 
"table.version IS NULL"). If you've got an object, e.g. a seq object, 
and you want to look up in the database by a unique key whether it 
already exists, and the unique key is composite where one of the 
components may be NULL, then you have to go through either 2 queries or 
quite some acrobatics in SQL construction (where I can tell you that 
the resulting SQL is not going to be fast on many RDBMSs!) to make that 
look up.
	Example: say there is a sequence object and the attributes by which I 
can look up is the familiar triple of (namespace,accession,version). If 
I do not create the statement from scratch every time I do the look up 
(which, if I did that, would give away an order of magnitude 
performance win on RDBMSs that do real prepared statements, like 
Oracle), I either have to prepare two statements for the same purpose 
(which is quite different from any other situation, making it into a 
special situation requiring special code - ugly), or I end up with a 
monster clause like

	WHERE ...
	AND bioentry.biodatabase_id = ?
	AND bioentry.accession = ?
	AND (
		bioentry.version = ?
		OR (
			bioentry.version IS NULL
			AND ? IS NULL
		)
	)

	which is not only going to be slow, but also requires you to bind the 
same parameter twice which is again special code - ugly.

The way I resolved this (quite nicely I think - my opinion anyway) is 
to introduce an artificial version 0 and treat it as NULL in the 
adaptor code. I.e., if a version attribute is undef, it gets passed as 
'0' to the statement execution method. If a version column has value 0, 
the object's attribute is not set (remains undef) (in perl extremely 
easy since 'if $value;' will treat 0 and undef as equivalent).

What I'm basically saying is that yes, theoretically you want version 
to be nullable, but in practice if you do that you're asking for 
trouble in the form of much more complex code, where the trouble is 
avoidable with relatively little effort on the code side of things, and 
the benefit is a big simplification of query construction and query 
execution code, because you rid yourself of special cases that need to 
be handled.

	-hilmar

On Wednesday, March 12, 2003, at 06:39  AM, Aaron J Mackey wrote:

>
> It looks like all the NOT NULL constraints on version were removed at 
> some
> point in history (which is good), but I noticed there's still one on
> dbxref's version ... is this a special case, or does it need to have 
> its
> NOT NULL removed as well?
>
> -Aaron
>
> -- 
>  Aaron J Mackey
>  Pearson Laboratory
>  University of Virginia
>  (434) 924-2821
>  amackey at virginia.edu
>
>
> _______________________________________________
> BioSQL-l mailing list
> BioSQL-l at open-bio.org
> http://open-bio.org/mailman/listinfo/biosql-l
>
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the BioSQL-l mailing list