[Biojava-l] regex performance in Java

P. Troshin to.petr at gmail.com
Wed Oct 24 17:59:19 UTC 2012

Hi Hilmar,

You have not mentions the version of JVM you are using, but it appears
that there is a massive difference in timing on my machine.
Here is the timing of the run on Windows 7 (pro 64 bit) with Oracle
JVM (64 bit) v. 1.7.0_02.

# of Iteration: 1t
Time: 1.711E-6 seconds

# of Iteration: 10
Time: 1.711E-6 seconds

# of Iteration: 100
Time: 2.567E-6 seconds

# of Iteration: 1000
Time: 1.2403E-5 seconds

# of Iteration: 10000
Time: 1.44143E-4 seconds

# of Iteration: 100000
Time: 0.001369138 seconds

I have not changed the code at all.
I have 3 year old laptop with Intel Core Duo P8600, 2.4 Ghz CPU. So
nothing special.
I cannot tell whether this is slow or not as you did not publish the
timings for Perl. Could you please do so.

It looks to me that you might just need to update/replace your JVM. I
will be happy to look at the code in a bit more details if this result
is still slower than Perl.


On 24 October 2012 17:47, Hilmar Lapp <hlapp at drycafe.net> wrote:
> Hi everyone,
> Thanks for all your responses. Indeed I know that the Java regex API isn't an enjoyable one to program with, and if the underlying task were about writing something from scratch, I'd be all for avoiding regex's too if the same thing could be achieved by string comparison.
> However, and of course I failed to say that initially, the task from which this query is originating is about converting a Perl script to Java (not because Perl is somehow bad, but because those Perl scripts have shown to be an obstacle to easy cross-platform installation of the - mostly Java - software they are a part of). That doesn't mean one couldn't in the course also rewrite the code that uses regular expressions to one that doesn't, but I also think it wise not to introduce multiple variables as a source of error at once.
> Some of the responses would be best answered by looking at the expressions and the code that uses them, so here are the two "benchmark" scripts.
> Java: https://gist.github.com/3940931
> Perl: https://gist.github.com/3940780
> I'm also copying Dongye Meng here, who is a CS student at UNC working with us on the project - if anyone has further wisdom to share about how to reduce the performance gap between the two versions, he'd surely appreciate.
>         -hilmar
> On Oct 23, 2012, at 6:42 AM, Phillip Lord wrote:
>> Hilmar Lapp <hlapp at drycafe.net> writes:
>>> They (at least as in java.util.regex) have been reported to me as
>>> performing much slower (by several orders of magnitude) than the regex
>>> implementation in Perl, and some simple benchmarking tests seem to
>>> bear that out. Even after scrutinizing the benchmark and finding
>>> nothing obvious, I'm still skeptical as to why this would be the case
>>> - naively I would have assumed that the underlying runtime library is
>>> implemented in C in both cases. But perhaps this is not true?
>> Well, the difference is that Perl is perl, while Java is not; it all
>> depends on the JVM, and libraries also. A quick shuftie at
>> the source for the open-jdk libraries suggests that the regexp searching
>> is done in Java -- it's not just a drop through to C. Always the problem
>> with performance optimisation on Java -- you are only optimising for one
>> situation. It might be interesting to see how much variation there is
>> between JVMs.
>> Like others, I would only use regexp as a last resort in Java anyway;
>> compared to Perl, writing the code is painful. Still, I guess that you
>> know this!
>> Phil
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l

More information about the Biojava-l mailing list