[Biojava-l] IndexedSequenceDB performance improvements

Keith James kdj@sanger.ac.uk
17 Aug 2001 22:36:41 +0100


I've been experimenting with making IndexedSequenceDB scale to the
whole of EMBL and have made a few changes to the getSequence()
method. This has given a fairly massive improvement on big files (no
exact figures I'm afraid because Compaq have kindly disabled the
profiler in their latest JDK *sigh*).

There is a class org.biojava.utils.io.RandomAccessReader which was
necessary for EMBL support. It's a buffered Reader which wraps a
RandomAccessFile. I put it in utils, rather than having it as an inner
class, as it could well be more generally useful.

For retrieving a single sequence from the Sanger EMBL flatfiles, most
of the time is spent starting the VM. I'll get some benchmarks done
soon.

-- 

-= Keith James - kdj@sanger.ac.uk - http://www.sanger.ac.uk/Users/kdj =-
The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambs CB10 1SA