[Bioperl-l] Performance of Bio::Species
Stefan Kirov
stefan.kirov at bms.com
Tue Nov 21 17:20:34 UTC 2006
New Bio::Species implementation seems to degrade significantly
performance. It seems this happens when the Bio::Tree::Tree is constructed.
See the stats bellow (based on simple Bio::Species object construction,
script and test sequence file attached):
10000 iterrations
new implementation:
Constructor: 115 wallclock secs (113.50 usr + 0.67 sys = 114.17 CPU)
Accessor: 0 wallclock secs ( 0.17 usr + 0.00 sys = 0.17 CPU)
old implementation (bioperl-1.4
Constructor: 1 wallclock secs ( 0.84 usr + 0.10 sys = 0.94 CPU)
Accessor: 0 wallclock secs ( 0.13 usr + 0.01 sys = 0.14 CPU)
You can see that when reading a genbank file you would double the time
necessary to construct the Bio::Seq object (100 iterations):
old implementation (bioperl-1.4
Constructor: 0 wallclock secs ( 0.01 usr + 0.00 sys = 0.01 CPU)
Accessor: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Constructor(seqio)/reading seq: 3 wallclock secs ( 2.51 usr + 0.31 sys
= 2.83 CPU)
new implementation:
Constructor: 2 wallclock secs ( 1.14 usr + 0.01 sys = 1.15 CPU)
Accessor: 0 wallclock secs ( 0.00 usr + 0.00 sys = 0.00 CPU)
Constructor(seqio)/reading seq: 5 wallclock secs ( 5.10 usr + 0.20 sys
= 5.30 CPU)
This may not pose a problem to people who read few sequences or files
with no lineage data, but it could be a significant headache otherwise.
I saw from CVS that Sendu knows there are memory leaks (I find cycles).
If the classification is supplied incorrectly (includes a reference to
an array in the classification array) things get really messy (~17 GB of
RAM for a Bio::Species object), though weird enough the cycle is not
indefinite. If I have more time I will try to debug this further and
submit a formal bug report/patch, but I am not sure if I will anytime
soon. I am sure there are people who understand
Bio::Taxon/Bio::Tree::Tree better than me and might have better idea how
to fix this.
Stefan
///
use Bio::Species;
use Benchmark;
use Bio::SeqIO;
my @classification=qw( sapiens Homo Hominidae
Catarrhini Primates Eutheria
Mammalia Vertebrata Chordata
Metazoa Eukaryota );
my $species;
my $t1 = new Benchmark;
for my $i (1..100) {
$species = Bio::Species->new(-classification => [@classification]);
}
my $t2 = new Benchmark;
for my $i (1..100) {
my $bin = $species->binomial;
}
my $t3 = new Benchmark;
print "Constructor: ", timestr(timediff($t2, $t1)),"\n";
print "Accessor: ", timestr(timediff($t3, $t2)),"\n";
my $f=shift;
my $t4= new Benchmark;
for my $i (1..100) {
my $sio=new Bio::SeqIO(-file=>$f,-format=>'genbank');
my $seq=$sio->next_seq;
}
my $t5= new Benchmark;
print "Constructor(seqio)/reading seq: ", timestr(timediff($t5, $t4)),"\n";
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: NM_000161.genbank
URL: <http://lists.open-bio.org/pipermail/bioperl-l/attachments/20061121/5d9e857e/attachment.ksh>
More information about the Bioperl-l
mailing list