[Biojava-l] regex performance in Java

daniel.quest at gmail.com daniel.quest at gmail.com
Tue Oct 23 04:51:37 UTC 2012


Wow!  This could open a huge flame war.  Let me just make a couple of quick points about performance.

Perl is implemented in C/C++, It is interpreted, and Java runs bytecode on top of the JVM.  The Venders of JVMs probably write the bytecode instruction set in C/assembly.  Java itself, at least at this point is most likely written in Java.  The speed of Java is greatly influenced by the underlying JVM and how well the JVM instruction set maps to the hardware.  The algorithm being implemented and the version of Java also have a great impact on performance.  Conventional wisdom is that Fortran is the best performing language in widespread use with interpreted languages such as Python, Ruby, and Perl being 3-8 times slower.  This website shows Java having about a ten percent overhead relative to C:  http://shootout.alioth.debian.org/

I have personally noticed superior performance of Perl's Regex parsing capabilities over Python. I have never noticed a difference between Perl and Java that was so extreme that I would choose to implement something in Perl over Java in a production setting.  Java is a language with such deep library support that it makes most every language look like a second class citizen in comparison (notable exceptions: C, C++, and JavaScript)

Something else interesting: http://swtch.com/~rsc/regexp/regexp1.html

Finally, be very cautious of benchmarks.  It is very very hard to do benchmarking well.
Dan 
Sent from my iPhone

On Oct 22, 2012, at 3:53 PM, Tiago Antão <tiagoantao at gmail.com> wrote:

> On Mon, Oct 22, 2012 at 9:42 PM, Andreas Prlic <andreas at sdsc.edu> wrote:
>> and used if nothing else works... Not sure if anybody else has a
>> different experience?
> 
> I might be beating a dead horse here, but I agree. I would say that
> from an idiomatic perspective Perl uses a lot of regex programming
> (Ruby also?), which is less common in most other languages (Java and
> Python are my work case). Regexes exist but are not the first option.
> That being said, there is a very cool JVM language which has regexes
> as first class objects: Clojure. But even in that case, I do not see
> lots of idiomatic use of regexes.
> 
> Tiago
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l




More information about the Biojava-l mailing list