[BioSQL-l] PostgreSQL & lookup & insert in bioperl-db
Charles Hauser
chauser at duke.edu
Thu Feb 27 11:06:01 EST 2003
Hilmar,
A couple of suggestions:
- I would crtainly run this problem by one of the postgres user groups
and see what suggestions they have. My experience has been they are
sharp and quite helpful.
http://www.ca.postgresql.org/users-lounge/
- I would also suggest contacting the GMOD/Chado developers as I feel
certain they would be dealing with the same issues.
Here's the possible solutions.
1) Turn off transactions for PostgreSQL, i.e., run in auto-commit mode.
Given the fact that Pg is a natively transactional database I find this
option possible but sad. In addition, there would have to be code that
specifically for PostgreSQL deletes the sequence entry if there was a
failure during a particular sequence entry. (Otherwise rolling back the
transaction takes care of cleaning up possibly incomplete rows.) On the
plus side, this is probably the simplest one to implement.
I would keep transactions ON - data integrity should be paramount.
2) Look up every record before insertion. On the plus side of things,
this would maintain transactions. On the downside, entities with a
finite number of records are already cached and looked up before insert
(examples being dbxref, reference, ontology_term). Those that are not
looked up are potentially infinite in number, meaning that 95% of the
time the record to be inserted will not already exist. A prior lookup
will be a waste of time in 95% of cases. I.e., this option will slow
down performance.
I ran a version of this (I think) for an EST database, where I looked up
say an est_id prior to loading it and a contig_id into an est_contig
relational table. When loading ~10,000 records I crashed our server.
Needless to say the sysadmin was not a happy camper.
3) Separate the look-up into PL/PgSQL code in the database. Instead of
issuing a direct INSERT you call a stored procedure that looks up the
entry and inserts it if not yet present. The advantage of this option
is that it keeps the bioperl-db layout largely unaffected and hence
treats all RDBMSs relatively equal (i.e., cares least about RDBMS
idiosyncrasies). The downside is that this means that the Pg version of
Biosql can't be fully auto-generated anymore from the MySQL version (I
don't want to auto-generate the PL/PgSQL procedures from a MySQL schema
definition). Also, people say PL/PgSQL is not the fastest thing out
there.
I have no experience using SPs, but when I browse others schemas I see
it used often - perhaps the best soln?
Charles
More information about the BioSQL-l
mailing list