[Biojava-dev] parsing bottleneck

Matthew Pocock matthew_pocock at yahoo.co.uk
Wed Mar 5 13:54:26 EST 2003


Down to 7m0s which is aprox 5.2M of text per seccond,
~20% faster. Now the profile is mainly things we can't
do anything about
(sun.nio.cs.UTF_8$Decoder.decodeArrayLoop,
java.lang.String.<init>,
java.io.BufferedReader.readLine). Still running on
100% cpu though.

Fingers crossed I haven't broken anything on the way.
Could someone who cares check that their favorite
genbank/embl file still parses as before?

Matthew

 --- Matthew Pocock <matthew_pocock at yahoo.co.uk>
wrote: > Hi,
> 
> I've just run refseq through our parsers. It takes
> me
> 8m30s to process the 2.2Gb genbank-formatted file
> rscu.gbff, and the process uses between 180 and
> 200Mb
> due to some whole arabadopsis genomes being in
> there.
> 
> Top thinks the process is running at prety much 100%
> cpu for all that time. -Xprof reccons 16% of this is
> in StringBuffer.charAt(), which I presume is being
> called in the FeatureTableParser class.
> 
> I'm going to have a tinker. If anybody has ideas,
> please tell me.
> 
> Matthew
> 
>  
> 
> __________________________________________________
> Do You Yahoo!?
> Everything you'll ever need on one web page
> from News and Sport to Email and Music Charts
> http://uk.my.yahoo.com
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at biojava.org
> http://biojava.org/mailman/listinfo/biojava-dev 

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com


More information about the biojava-dev mailing list