[Biojava-l] databases

Matthew Pocock mrp@sanger.ac.uk
Fri, 14 Jul 2000 17:40:17 +0100


Dear all,

The manuscript for SIGBIO is submitted. Thankyou to everybody that made
sudgestions.

I have added org.biojava.bio.db.CachingDatabase that decorates a parent
database, adding memory-sensetive caching of sequences. This means that
you can safely have a cached view of Embl, and when you drop all
references to one of the sequences in embl in your script, the sequence
will be held in the cache untill the VM needs the memory for something
else.

I have also added (cowering and waiting for flames) IndexedSequenceDB
that uses a standard IO object to index a group of sequence files. This
is a bit of a hack, and won't scale well to HUGE databases, but is ideal
for short-tearm small-to-medium databases that are often generated as
intermediate steps in scripts. demos/seq/db contains some utilities for
instantiating indexes, adding files, listing all sequences and
retrieving sequences. It worked for me, but you may be able to break it.

I found lots of serialization issues in the core classes. I hope that
these are fixed. I apologise if they break realy old code. Anything
serialized within the last fortnight probably won't unserialize well
regardless of these changes. I guess for 1.0, we must lock all the
serial uids down.

Thanks again for the help with the manuscript

Matthew
--
Joon: You're out of your tree
Sam:  It wasn't my tree
                                                 (Benny & Joon)