[Biojava-l] Fwd: regex performance in Java

P. Troshin to.petr at gmail.com
Mon Oct 22 20:48:02 UTC 2012


Sorry, I should have written to the list. Also just want to say that I
agree with Andreas, in Java we use regexp if everything else fails
(:-))

Regards,
P.



---------- Forwarded message ----------
From: P. Troshin <to.petr at gmail.com>
Date: 22 October 2012 21:44
Subject: Re: [Biojava-l] regex performance in Java
To: Hilmar Lapp <hlapp at drycafe.net>


Hi Hilmar,

I think this is one of the myths, I do not think there is a
difference. It might have been true long ago, but I do not think this
is still the case. Last time we compared Perl, Python and Java
performance the former was the last with a large margin :-).  However,
I never had to make a direct comparison of regexp. Google for "perl vs
java regexp speed comparison" brings a few links. I had a quick look
at one result only
(http://onlyjob.blogspot.co.uk/2011/03/perl5-python-ruby-php-c-c-lua-tcl.html),
it claimed that Perl regexp is faster than Java. Unfortunately the
author of the test clearly lacked understanding of Java and as a
result the test compared the performance of String concatenation
(which is notoriously bad in Java, as Strings are immutable) rather
than the regexp performance itself. I guess this is an easy mistake to
make though. Hence the advice - if you are doing a lot of String
permutations use the StringBuilder class, not the String itself.
If you have a Java implementation which is lacking I am sure people on
this list will have no problem optimizing it!

Regards,
Peter



On 22 October 2012 15:52, Hilmar Lapp <hlapp at drycafe.net> wrote:
> I know that this is really Java language topic, but since parsing biological data formats is to rife with regular expression applications, I'm curious what the experience is among the Biojava people with the use of regular expressions in Java.
>
> They (at least as in java.util.regex) have been reported to me as performing much slower (by several orders of magnitude) than the regex implementation in Perl, and some simple benchmarking tests seem to bear that out. Even after scrutinizing the benchmark and finding nothing obvious, I'm still skeptical as to why this would be the case - naively I would have assumed that the underlying runtime library is implemented in C in both cases. But perhaps this is not true?
>
> Any experience people have made here speed-wise (or tricks or things not to do for Java regex's) would be appreciated.
>
>         -hilmar
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at drycafe dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list