[Biojava-l] readGenbank performance

Schreiber, Mark mark.schreiber@agresearch.co.nz
Wed, 30 Oct 2002 10:09:13 +1300


Hi -

Its often hard to compare a perl lib to biojava without knowing what the
perl lib does, biojava does a reasonable amount of checking that the
symbols used match the alphabet etc and does most of its work on Symbols
as Objects, probably the perl lib does everything as Strings.

You can cut down on overhead if you only want a particular part of the
sequence. Matthew and I where just discussing how making a custom
listener for a particular field in a file can perform as fast a grep. If
you are only interested in the Sequence information for example you
could ignore all the rest as by default it gets processed and stored as
annotations and features of the object.

- Mark


> -----Original Message-----
> From: David P Dean [mailto:deandp@groton.pfizer.com] 
> Sent: Wednesday, 30 October 2002 10:02 a.m.
> To: biojava-l@biojava.org
> Subject: [Biojava-l] readGenbank performance
> 
> 
> Hi,
> I'm new to BioJava and am very keen to learn more about it. 
> I've got a routine to read some Genbank sequences and do 
> stuff and that works fine. But I'm suprised it doesn't run 
> faster. A basic read loop like:
> 
>      sit = SeqIOTools.readGenbank(br);
>      while( sit.hasNext() ) {
>         Sequence entry = sit.nextSequence();
> 
> takes about 90 seconds to read 10,000 Genbank EST entries on 
> my Sparc Ultra 10. A comparable perl library I have that 
> iterates over the set and parses all the records takes about 
> half the time. Is this expected, or any suggestions?
> 
> I have downloaded and built biojava-live and am game to tweak 
> things. Is there any kind of profiling tool that would show 
> where the time is going? Also, I am using an older Solaris 
> JVM, 1.3.0. Could this be a factor?
> 
> Thanks!
> David Dean
> ----
> Count your blessing.
> 
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l@biojava.org 
> http://biojava.org/mailman/listinfo/biojava-l
> 
=======================================================================
Attention: The information contained in this message and/or attachments
from AgResearch Limited is intended only for the persons or entities
to which it is addressed and may contain confidential and/or privileged
material. Any review, retransmission, dissemination or other use of, or
taking of any action in reliance upon, this information by persons or
entities other than the intended recipients is prohibited by AgResearch
Limited. If you have received this message in error, please notify the
sender immediately.
=======================================================================