[Bioperl-l] biosql primary key generation
Hilmar Lapp
hlapp@gnf.org
Sat, 10 Aug 2002 20:23:43 -0700
I've talked about this to several people at ISMB and BOSC, including
ThomasD and ChrisM, and at least no-one told me to stop. So, I'm
repeating my proposal here to a wider audience, to give everyone a
chance to shout.
First off, I'll give a backgrounder on the problem that bothers me.
If you know what I'm talking about, skip the next paragraph.
The problem with the topic is that the MySQL idiosyncratic way of
generating PKs behind the scenes via AUTO_INCREMENT fields is
completely incompatible with any other reasonable RDBMS, including
Postgres and Oracle. The problem arises if you want control over the
new PK assigned in INSERTs in order to return them to an
application. The 'normal' way to achieve this is that the RDBMS
provides you with so-called sequences, from which you obtain a new,
guaranteed unique PK and then supply this PK to the INSERT statement
instead of leaving the field empty. Since MySQL does not have
sequences, you need to resort to calling a certain function _after_
executing the INSERT statement, which is not only very different but
also makes for a different calling flow. Hence, porting to Oracle,
which is what I'm trying to do, is a major headache or a ticket to a
great deal of mess and code duplication.
Now my proposal: assume for the calling flow generally that the
actual RDBMS does have sequences, and hence before inserting a
record with a PK obtain a new value and supply it to the insert
statement. For RDBMSs like MySQL that do not have sequences, fake a
sequence.
Technically, given record- or table-level locks, a sequence can be
faked reasonably easy using a dedicated table. In fact, it turns out
that there is already a package DBIx::MySQLSequence on CPAN, which
does exactly that and wraps it in a nice API.
The only problem with faking sequences is if inserts are made not
only through bioperl-db, but also e.g. biojava (or any other bio*
package other than bioperl) into a MySQL-based database. I.e., other
packages have to re-implement the base functionality of
DBIx::MySQLSequence (which should be fairly easy though) to the
extent of averting PK clashes
Other than inserts, other projects would be unaffected.
Comments appreciated.
For me this is the sanest approach to cross-RDBMS code at this time,
and hence I've already gone ahead and started implementing it. I've
done a driver-specific API interface, the MySQL implementation, and
dynamic loading of the respective driver. Next thing is I'll rewrite
all inserts to follow the scheme I'm proposing.
If people vote against this I'll branch off and commit the changes
to the branch.
-hilmar
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------