[BioSQL-l] Problems in DB2 with VARCHAR, TEXT and CLOB using BioJava

Florian Mittag florian.mittag at uni-tuebingen.de
Thu Jul 9 15:16:12 UTC 2009


Hi all!

I'm posting this to both the BioSQL and the BioJava-dev mailinglist because 
the problem resides in both domains, I hope this is okay.

We're working on getting BioJava to run with a DB2 Express-C backend for 
various reasons. We've encountered several problems during this task, but 
this one seems to have no real solution.

When adapting the BioSQL schema to DB2, the official IBM conversion guide 
tells us to use the data type CLOB where MySQL uses TEXT.

(Chapter 11 in
ftp://ftp.software.ibm.com/software/data/db2/migration/mtk/mtk_2050.pdf)

So far, no problem. But when we tried reading some genebank files with 
BioJava, the DB2 driver threw an exception:

SQL0401N  The data types of the operands for the operation "=" are not 
compatible.  SQLSTATE=42818 SQLCODE=-401

Explanation:
The class org.biojavax.bio.db.biosql.BioSQLRichObjectBuilder defines some 
Hibernate queries, of which one has the conditions:

"from DocRef as cr where cr.authors = ? and cr.location = ? and cr.title = ?"

All three columns "authors", "location", and "title" are of type TEXT in MySQL 
and of type CLOB in DB2, so comparing them with "=" leads to the above error 
message.


The way I see it, there are only two possible solutions to this problem:
1) Change the query to
"from DocRef as cr where cr.authors LIKE '?' and cr.location LIKE '?' and 
cr.title LIKE '?'"

2) Change the data type to something comparable with "=", like VARCHAR.

Solution 1 is no real solution to me, because comparing values with "LIKE" 
usually is slow and it seems a bit odd to change a query that works with 
other databases just for DB2.

But taking a closer look, solution 2 has some problems, too:
Although VARCHARs in DB2 can have a length of theoretically 32767, in reality 
they are limited by the page size of the database, which can be 32K at 
maximum. Since this particular table "reference" has three columns of this 
type, the sum of their lengths must not exceed 32767, so they could only be 
something like VARCHAR(10000).

I have never encountered cases in which values come even close to the length 
of 10000, but you can never be sure.


And that is why I post here. For me, the way to go is pretty clear, but we 
intend to be as compatible as possible with the original BioSQL. Maybe you 
could give me some input on how to solve this problem with as few casualties 
as possible ;-)


Thanks,
Florian



More information about the BioSQL-l mailing list