[Biojava-l] Re: Biojava-l digest, Vol 1 #64 - 5 msgs (Object persistence)

Aaron Kitzmiller AKitzmiller@genetics.com
Fri, 28 Apr 2000 14:36:18 -0400


A couple of us at Genetics Institute have been working with biojava objects with the express purpose of using them in persistence-aware applications.  For large companies, object databases don't make much sense, so we've been developing some object/relational mapping strategies that might be generally useful.  Though the code is in early stages right now, it may be usable in a month or two and we can certainly put it out there.

The idea is pretty simple and, I believe, inspired by TOPLink and other O/R mapping tools.  Classes that are supposed to be persisted implement the Persistable interface.  The interface is pretty simple and includes things like getPrimaryKey() and save().  It also includes a method called getBroker() that retrieves an object that implements the Broker interface.  The Broker class, which is implemented for a particular type of object, actually handles the SQL necessary to return a set of objects that match query criteria, create the object, and insert or update an object.  It's just a single place to do all of your SQL.  You don't get much out-of-the-box with these interfaces since they have to be constructed for a particular database schema.  However, you get a fairly flexible and consistent way to isolate your data objects from the way in which they are created.   

If there is enough interest (and we can get it past the censors), we'll submit the code to biojava.  We can probably even submit the code for the particular relational database that we're using.  



>>> <biojava-l-admin@biojava.org> 04/27 12:00 PM >>>

Send Biojava-l mailing list submissions to
	biojava-l@biojava.org 

To subscribe or unsubscribe via the web, visit
	http://biojava.org/mailman/listinfo/biojava-l 
or, via email, send a message with subject or body 'help' to
	biojava-l-request@biojava.org 
You can reach the person managing the list at
	biojava-l-admin@biojava.org 

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Biojava-l digest..."


Today's Topics:

  1. Ace client & Corba (Matthew Pocock)
  2. persistence for org.biojava.bio.seq and org.biojava.bio.symbol (Gerald Loeffler)
  3. Re: persistence for org.biojava.bio.seq and
 org.biojava.bio.symbol (Tom Oinn)
  4. Re: persistence for org.biojava.bio.seq and org.biojava.bio.symbol (Thomas Down)

--__--__--

Message: 1
Date: Wed, 26 Apr 2000 18:24:11 +0100
From: Matthew Pocock <mrp@sanger.ac.uk>
Organization: The Sanger Center
To: "biojava-l@biojava.org" <biojava-l@biojava.org>
Subject: [Biojava-l] Ace client & Corba

Dear all,

I have checked in some new demos. EmblToFastaGFF is self explanatory,
but a usefull utility all the same. AceClient is an ace command line
client using the rather excelent ace socket server & the pure-java
implementation by Thomas. SequenceFetch allows you to fetch sequences by
name from an ace database and save them as fasta files. ServeAceAsCorba
uses the Ace client to read sequences from an Ace database, and serves
them as BioCorba objects.

While doing this I fixed numerous bugs in the ace and corba code - some
where caused during the renaming and some are long-standing.

Have fun - tell me if the demos break.

Matthew
--
Joon: You're out of your tree
Sam:  It wasn't my tree
                                                 (Benny & Joon)



--__--__--

Message: 2
Date: Wed, 26 Apr 2000 22:52:47 +0200
From: Gerald Loeffler <Gerald.Loeffler@vienna.at>
Reply-To: Gerald.Loeffler@vienna.at 
To: biojava-l@biojava.org 
Subject: [Biojava-l] persistence for org.biojava.bio.seq and org.biojava.bio.symbol

Hi!

Are there any opinions/experiences/implementations out there regarding
the persistence of objects from the org.biojava.bio.seq and
org.biojava.bio.symbol packages, most notably Sequence?!

E.g. imagine that a tool allows users to (interactively) create Sequence
objects, complete with SymbolList, Features and Annotations.
Additionally, Alignment objects might be constructed that make use of
these Sequence objects. How are we to make the so created (highly
complex) graph of objects persistent, such that all objects will be
available in later invocations of the tool?

Two extreme mechanisms of persistence seem logical:
  1) persisting a Sequence object (and the object graph originating at
this object) to a well-known flat-file format (EMBL, GenBank, MSF, ...)
(with the advantages of platform-neutrality, familiarity, easy exchange,
and a host of existing tools that could potentially use it, etc.). 
  2) persisting the object graph more or less directly to an
object-oriented database (with the advantages of speed of access,
transaction-safety, networked access, access through (OQL or SQL)
queries, automated indexing, etc., etc.). The main disadvantage is
probably that the persistent store in this way depends on implementation
details of the persistent classes (ODBMSs store object fields and if you
change the fields of your persistent classes you risk not being able to
read objects from the ODMBS - even if the external contract of your
class (it's protected and public methods) remains entirely intact!).

(Recently i've made intimate contact with object-oriented databases from
Java (see ODMG 3.0 under http://www.odmg.org/) and grew quite fond of
them - especially in contrast to relational databases when the
object-model to persists is very complicated (as the one implemented in
org.biojava.bio.seq and org.biojava.bio.symbol surely is)!)

Any opinions on this?

	cheers,
	gerald

P.S.: I'm off now to a 6-day holiday, so I'll only be able to respond to
your mails after my return...
-- 
   Gerald.Loeffler@vienna.at _________________ Software Architect
   http://www.imp.univie.ac.at ____ http://www.daemonstration.com 
   OOA&D, Java, J2EE, JSP, Servlets, JavaBeans, ODBMS, RDBMS, XML

--__--__--

Message: 3
Date: Thu, 27 Apr 2000 10:10:10 +0100
From: Tom Oinn <tmo@ebi.ac.uk>
To: biojava-l@biojava.org 
Subject: Re: [Biojava-l] persistence for org.biojava.bio.seq and
org.biojava.bio.symbol

Gerald Loeffler wrote:
> 
> Hi!
> 
> Are there any opinions/experiences/implementations out there regarding
> the persistence of objects from the org.biojava.bio.seq and
> org.biojava.bio.symbol packages, most notably Sequence?!
> 
> E.g. imagine that a tool allows users to (interactively) create Sequence
> objects, complete with SymbolList, Features and Annotations.
> Additionally, Alignment objects might be constructed that make use of
> these Sequence objects. How are we to make the so created (highly
> complex) graph of objects persistent, such that all objects will be
> available in later invocations of the tool?

Surely the best way to store your data structure for later use by the
same tool (or other biojava based code) would be to serialize the
sequence object (either to disk or as a blob in oracle etc.). This is
trivial, platform independent (subject to use of biojava on that
platform) and pretty robust.

I would advise against storing any complex inter linked data structure
in a flat file if you can possibly avoid it. If you do want to store to
file in a way that isn't java specific, I'd suggest some non flat format
such as xml with xlink. That way you can easily retrieve your complex
data afterwards.

Hope this is of some help.

Tom

--
Using new ram-scoop infusion techniques, Undead Boars are twice as 
undead as, say, Windows NT or IRIX. I find Undead Boars to be an 
invaluable asset whether I'm hunting fierce animals or managing a 
massive heterogeneous network.

--__--__--

Message: 4
Date: Thu, 27 Apr 2000 10:20:54 +0100
From: Thomas Down <td2@sanger.ac.uk>
To: Tom Oinn <tmo@ebi.ac.uk>
Cc: biojava-l@biojava.org 
Subject: Re: [Biojava-l] persistence for org.biojava.bio.seq and org.biojava.bio.symbol
Organization: This tangled web on which I'm laid intwined

On Thu, Apr 27, 2000 at 10:10:10AM +0100, Tom Oinn wrote:
> 
> Surely the best way to store your data structure for later use by the
> same tool (or other biojava based code) would be to serialize the
> sequence object (either to disk or as a blob in oracle etc.). This is
> trivial, platform independent (subject to use of biojava on that
> platform) and pretty robust.

Yes, I'd agree that Serialization is probably the right way
to go, unless you particularly need a method of interworking
data with non-Biojava tools.

Serialization support is a goal for biojava, but it's not
fully implemented yet.  The main issues are that Biojava
relies heavily on the uniqueness of some critical singleton
objects: for instance, there should only be one instance
of the DNA alphabet in a given virtual machine.  Similarly,
there should only be single instances of each of the Symbols
in the well-known alphabets.

Some time ago, I implemented some trickery which should
allow well-known alphabets and symbols to be serialized
safely.  What /hasn't/ been done is to go through and
add `implements Serializable' to other classes which
WILL serialize safely, but haven't actually been tested.
SimpleSymbolList, for instance, probably falls into this
category.

>From now on, it's probably worth making a point of making
all the core BioJava objects serializable.

Happy hacking,
  Thomas.
-- 
There are whose study is of smells
And to attentive schools rehearse
How something mixed with something else
Makes something worse.




--__--__--

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org 
http://biojava.org/mailman/listinfo/biojava-l 


End of Biojava-l Digest

Aaron Kitzmiller
Manager Systems Development -Cambridge
Bioinformatics Department
35 Cambridge Park Dr.
Cambridge, MA 02140
Phone: (617) 665-6831
Fax: (617) 665-8870
Email: akitzmiller@genetics.com