[Biojava-l] Serializable

Thomas Down td2@sanger.ac.uk
Tue, 6 Nov 2001 20:33:29 +0000


On Wed, Nov 07, 2001 at 09:13:14AM +1300, Schreiber, Mark wrote:
> 
> I like the idea of serialization tests and I would write some myself but
> my thesis is due at the end of this year and I've got a 3 week trip to
> the US and UK to squeeze in ;-)

I know the feeling!  Good luck...

> I imagine that a small number of tests could test a large number of
> serializations, ie if a SequenceDB is filled with various types of
> sequences, some view sequences some standard, DNA, protein, RNA etc
> along with a range of features, nested and otherwise. I think that
> should cover most of the core classes. Importantly it would need to call
> methods that rely on transient variables to make sure they had been
> rebuilt properly.

Yes, that sort of thing should work quite well.  We probably
need to try de-serializing some pre-serialized files, as
well as doing round-trip tests, since there are possible
cases where a serialized stream will only be valid in a
particular VM (see below).

> I made a modification to simple distribution to do custom serialization
> namley the AlphabetIndexer is transient and a private readObject()
> method is added to rebuild the AlphabetIndexer when the class is
> deserialized. This seems to work for me, do you think it is safe?

I actually made a similar change just before I went away, but
I'm not sure whether I checked it in or not.  Probably not,
by the sound of things.  The only difference was that instead
of having a readObject(), I had a protected getIndexer() method,
which re-created the transient indexer on demand (I slightly
prefer this pattern, but it makes very little difference).

But there's actually a much bigger problem -- two AlphabetIndexes
of the same Alphabet aren't garuenteed to order the symbols
the same way.  It happens that our current implementations
seem to give pretty stable numbering, but there's potential
for all sorts of horrible Hiessenbugs.  This is particularly
likely if serialization is being used to move objects from
one VM implementation to another.

This is why I proposed the more complex (but reliable) method
of explicitly serializing symbol-weight tuples.

> Possibly we should think about making all classes Serializable unless
> there is a reason why they should be rebuilt on deserialization. This
> approach is recommended in  the book Thinking in Java.

Definitely.

   Thomas.