[Bioperl-l] Bio-Perl workload for benchmarking Perl

Tue Apr 6 22:35:54 UTC 2010

Robert Bradbury <robert.bradbury at gmail.com> writes:
> Creating genome indexes usually only takes a couple of minutes.
> Single gene blast homology searches (which are generally done in C,
> not Perl) of entire genomes only take a couple of minutes.

Then I would also take the “some minutes” workloads.
I just said 1h because I thought I had a choice.

> If you want something that takes an hour its going to have to be very
> wide or very deep (low homology searches?; lots of gene searches
> across many genomes?)  The only things I can think of that require
> that much CPU are whole genome assembly (which I don't recall BioPerl
> being designed to do) or perhaps chromosomal synteny searches across
> multiple genomes (which I also don't recall BioPerl handling).

I do not understand that but for my purposes I just trust you and I
think it just needs to be “typical” for Bio-Perl use-cases.

It's like with SpamAssassin: they provide a corpus of known spam/ham
mails and I just process it like it's in the README; and fortunately
it takes some minutes or at least noticeably many seconds.

> If one is going to do Perl benchmarking one needs to identify those
> applications which are best implemented in Perl and are not
> easily/have not been re-implemented in C or driven down to the
> hardware level.

Yep. That's what I'm looking for. And some of that Bio-Perl *has* to
be Perl for a reason, hasn't it?

That's why I'm asking here. And feel free to point me to other Perl
problem domains worth to ask if you know one…

Thanks.

Kind regards,
Steffen 
-- 
Steffen Schwigon <ss5 at renormalist.net>
Dresden Perl Mongers <http://dresden-pm.org/>