[Biojava-l] persistence - and the problems with it

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Wed, 03 May 2000 15:45:37 +0100


Hi Gerald,

I thought you'd have caused more of a stir with that post - I certainly
enjoyed it! Seeing as no-one else has replied yet...

Gerald Loeffler wrote:

> Hi!
>
> Let me comment on something that has been said on this list on the topic
> of making BioJava objects persistent:
>
> 1) Java Serialisation is a very bad way of making objects persistent.

Agreed!  There are just sooooooo many bad things about Java serialization...

> 4) Writing explicit code (using JDBC) to persist a complex network of
> Java objects (an Alignment and its Sequences and all its Annotations and
> Features and so on) into a relational database is _very_ tedious and
> error-prone. I honestly can't imagine doing this for all the classes
> (interfaces) in BioJava!

Yes it's tedious, but whilst it's easy to make errors writing lots of
database calls by using JDBC, it's really a long way from being impossible
to do it correctly.

> Besides, unless you are clever with caching and
> so on, your performance will be lousy (because you are triggering _a
> lot_ of very small database requests - at least one for each (usually
> very fine grained) object.) - and if you do clever caching, you are
> essentially implementing your own object-relational mapping tool:

True - but I'm not exactly clear what you're trying to say here. I'm
definitely getting the impression that you personally don't want to do this
;-) But do you think other people:

  a) Shouldn't do it
  b) Can't do it
  c) Should go ahead and do it if they want to
  d) Should consider collaborating on a general open source
object-relational mapping tool, rather than writing something specific for
biojava.
  e) Do something else

> 5) Object-relational mapping tools are very sophisticated software
> products - hence their price tag. The good ones transparently and
> efficiently map the Java-side onto the relational database-side and
> vice-versa - i.e. they (automatically) generate and use a relational
> database schema from your object model (.java files). They make all the
> JDBC-coding unnecessary. They make your relational database look like an
> object database (i.e. Oracle can suddenly be programmed using e.g. the
> ODMG Java binding.) They cache your objects, preserve object identity
> accross distributed caches, know how to perform queries... I'd never
> dream about starting a project to write such a beast from scratch when
> there are quite a few companies who specialise in this...

That's fine.  The only thing is, much of the potential user community of
Biojava may not have the budget to buy expensive Enterprise-class software.
So if persistence of objects is a goal of biojava, whatever the solution is
should probably not rely on costly infrastructure.

Are there any mature, high-quality, feature-rich tools that get you where
need to go in terms of developing high-performance systems?  In your
experience what are the best commercial Java object-relational mapping
tools?  What are the benefits, if any, of the commercial tools over the free
tools.

Also which do you think is the best pure object-database for dealing with
Java objects?

You didn't discuss using XML representations of  biojava objects.  That
might offer a reasonable way to allow a wide variety of types of user to
exploit biojava. Once you have the XML you can do what you like with it...

What do you think?
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com