[Biojava-l] persistence - and the problems with it

Gerald Loeffler Gerald.Loeffler@vienna.at
Tue, 02 May 2000 23:58:13 +0200


Hi!

Let me comment on something that has been said on this list on the topic
of making BioJava objects persistent:

1) Java Serialisation is a very bad way of making objects persistent. It
is very slow; it leads to _enormously_ big data stores; there are
serious problems with preservation of object identity; it is almost
impossible to handle more objects then fit into main memory at any time;
it is not transactionally safe; it does not offer networked access; it
does not offer a way to query the persistent objects; and so on and so
forth. In other words: I've never heard of a serious project that used
Java Serialisation as the persistence mechanism. (Of course it's easy to
"serialise your Java objects away" - but in this way you can not build
up a database of persisted objects!)

2) It has been said that "For large companies, object databases don't
make much sense". Oh well. Firstly I think that pure object databases
_do_ make sense - but I realize that this is a religious debate to some.
Much more important is this: To use an object-oriented API for
persistence does _not_ say that you need to use a pure object-oriented
database as your database backend - you _can_ but you need not! The
persistence API is one thing - the database is another thing: it may be
relational with an object-relational mapping tool on top; or it may be a
pure object database.

3) The only standardised API for transparent object-persistence from
Java to this date is the ODMG 3.0 Java binding (http://www.odmg.org).
(It's successor, the Java Data Objects is underway:
http://java.sun.com/products/jdbc/related.html). It offers a portable,
very natural (IMHO) way to make Java objects persistent and to query the
data store for objects with certain properties. Very few implementations
of this API exist to date. There _are_ however pure object databases as
well as object-relational mapping tools that support this API - i.e.,
you can use Oracle as your database backend if you really like.

4) Writing explicit code (using JDBC) to persist a complex network of
Java objects (an Alignment and its Sequences and all its Annotations and
Features and so on) into a relational database is _very_ tedious and
error-prone. I honestly can't imagine doing this for all the classes
(interfaces) in BioJava! Besides, unless you are clever with caching and
so on, your performance will be lousy (because you are triggering _a
lot_ of very small database requests - at least one for each (usually
very fine grained) object.) - and if you do clever caching, you are
essentially implementing your own object-relational mapping tool:

5) Object-relational mapping tools are very sophisticated software
products - hence their price tag. The good ones transparently and
efficiently map the Java-side onto the relational database-side and
vice-versa - i.e. they (automatically) generate and use a relational
database schema from your object model (.java files). They make all the
JDBC-coding unnecessary. They make your relational database look like an
object database (i.e. Oracle can suddenly be programmed using e.g. the
ODMG Java binding.) They cache your objects, preserve object identity
accross distributed caches, know how to perform queries... I'd never
dream about starting a project to write such a beast from scratch when
there are quite a few companies who specialise in this...

	sorry for the many words (-:
	gerald
-- 
   Gerald.Loeffler@vienna.at _________________ Software Architect
   http://www.imp.univie.ac.at ____ http://www.daemonstration.com
   OOA&D, Java, J2EE, JSP, Servlets, JavaBeans, ODBMS, RDBMS, XML