[Biojava-l] SQL-backed persistent Biojava sequence/feature objects

Thomas Down td2@sanger.ac.uk
Mon, 30 Jul 2001 13:16:11 +0100


On Mon, Jul 30, 2001 at 12:02:11PM +0100, Ewan Birney wrote:
> On Mon, 30 Jul 2001, David Huen wrote:
> 
> > a) choice of DB
> > I've made a start on implementing the above and using Postgresql
> > for the purpose as it appears to be the only "free" database with
> > transactions implemented.  I figure we will want transactions as we will
> > either want the sequences/features completely instantiated or not at all.
> > 
> > There appears to be a JDBC for Postgresql.

There is.  In fact, it's been bundled with the PostgreSQL releases
since at least version 7.0.

(Aside: PostgreSQL /isn't/ the only open source DB with transactions.
There's also Borland Interbase and SAP DB, and maybe some others
too.  That said, I like PostgreSQL a lot, so I'm not complaining!)

> > The above might take some time as my SQL and JDBC are pretty much ground
> > zero.  Ach well, things can only get better...
> 
> 
> Guys - the sensible thing is to merge efforts with my Bioperl-db - not
> point have two schema's doing the same thing.

I've just been taking a look at the bioperl-db schema, and
it's certainly worth a good look by anyone who's interested
in this project.

I would say that it's quite strongly tied to BioPerl though,
or at least the BioPerl way of looking at things.  We should
look quite carefully at what the requirements are for persistance
in BioJava.  For instance, a Java-centric schema could get
away with tricks like serializing any datatypes it didn't
explicitly understand (I'm thinking particularly of
Annotation-bundle data here).  That sort of thing could probably
be piggy-backed onto the BPDB schema as an `optional extra'
without too much trouble.

A rather bigger problem is hierarchical features, which I'd
say were quite important if we're aiming for `persistant
BioJava' rather than a more general database system.  This
definitely does mean a new schema.  And quite possibly
stored proceedures on the server (or something similar) to
keep the performance good -- at least given my past experiences
with hierarchical data in SQL.

Anyway, sorry for rambling on.  I think the point I'd like to
make is that there are two slightly different problems here:

  - A general, lightweight, database mechanism which can
    be shared between different projects.  BPDB looks like
    a reasonable schema for this sort of thing.

  - A system tuned for a particular object model, trying to
    get as close to that model as posssible.  This should give
    `persistant objects' which behave extremely closely to,
    for example, the normal in-memory BioJava Sequences.

It's worth being clear about which of these is being addressed
before making too many commitments here.

> I would be more than happy to see the bioperl-db move from mysql to
> postgres or ideally be used by both.

Yes, that would be good.  I don't think the schema itself
will cause any trouble at all, but the code problems are hard
to avoid.  The one I always run into is generation of new IDs
when inserting data.  It would be nice to see this abstracted
away somehow, but neither JDBC or Perl/DBI seem to do this...


    Thomas.