[Biojava-l] readGenbank performance

David P Dean deandp@groton.pfizer.com
Tue, 29 Oct 2002 16:02:11 -0500


Hi,
I'm new to BioJava and am very keen to learn more about it. I've got a
routine to read some Genbank sequences and do stuff and that works fine.
But I'm suprised it doesn't run faster. A basic read loop like:

     sit = SeqIOTools.readGenbank(br);
     while( sit.hasNext() ) {
        Sequence entry = sit.nextSequence();

takes about 90 seconds to read 10,000 Genbank EST entries on my Sparc
Ultra 10. A comparable perl library I have that iterates over the set
and parses all the records takes about half the time. Is this expected,
or any suggestions?

I have downloaded and built biojava-live and am game to tweak things. Is
there any kind of profiling tool that would show where the time is
going? Also, I am using an older Solaris JVM, 1.3.0. Could this be a
factor?

Thanks!
David Dean
----
Count your blessing.