[Bioperl-l] Public Release of the Xobjects gene expression package

Todd Richmond todd@andrew2.stanford.edu
Mon, 30 Jul 2001 09:18:41 -0700


On 7/29/01 2:40 PM, "Nathan O. Siemers" <nathan@dnase.hpw.pri.bms.com>
wrote:

>       Stats.pm works, but is ugly (my first perl module years ago
>       now), and needs to be replaced by the newer stat module that
>       has finally appeared on cpan.  When Xobjects was first
>       written, the public Statistics::Descriptive was not sufficient
>       to get our work done.

It may be ugly - but is it fast? I tried using Statistics::Descriptive for
some of my microarray work, but discovered that the overhead of creating the
Statistics objects made it too slow. I benchmarked it 8-9X slower than a
simple hand-coded subroutine, using it to find the mean of 3 values, 20000
times.

Benchmark: timing 20000 iterations of Statistics::Descriptive, mean
subroutine...
Statistics::Descriptive:  9 wallclock secs ( 7.12 usr +  0.00 sys =  7.12
CPU) @ 2808.99/s (n=20000)
mean subroutine:  1 wallclock secs ( 0.99 usr +  0.00 sys =  0.99 CPU) @
20202.02/s (n=20000)

Code follows:

#!/usr/bin/perl -w

use Statistics::Descriptive;
use Benchmark;

timethese(20000, {
'Statistics::Descriptive' => '

my @values;

for my $number (0 .. 2) {
    push(@values,rand()*10000);
}

my $stat = Statistics::Descriptive::Sparse->new();
$stat->add_data(@values);
my $mean = $stat->mean;

',
'mean subroutine' => '

my @values;

for my $number (0 .. 2) {
    push(@values,rand()*10000);
}

my $mean = mean(\@values);

',
});


sub mean {
    my ($arrayref) = @_;
    my $result;
    if (@$arrayref == 0) { return undef}
    else {
        foreach (@$arrayref) { $result += $_ }
        return $result/@$arrayref;
    }
}


-- 
Todd Richmond                    http://cellwall.stanford.edu/todd
Carnegie Institution             email: todd@andrew2.stanford.edu
Department of Plant Biology      fax: 1-650-325-6857
260 Panama Street                phone: 1-650-325-1521 x431
Stanford, CA 94305