[BioSQL-l] bioperl-db performance
DNNM (Dennis Madsen)
dnnm at novonordisk.com
Thu Nov 13 03:54:40 EST 2003
Hi,
I have seen other mails concerning a relatively
low throughput of sequences during storage (with load_seqdatabase.pl). I have
the same problem with the latest bioperl-db, bioperl 1.2.3,
perl 5.8.1, RedHat 9 (with newly compiled perl to avoid the utf-8 problems in rh9).
We have tested various RDBMS: MySQL 3.23.54a, MySQL 4.0.16 and
Oracle 9.2.0.4 on different machines with 1-2 CPUs 2.5GHz P4, 1-2 Gb mem. and
lots of disk space. But no matter what, the throughput is about
5 sequences per second. If I understand the benchmarks correct the
expected throughput is about 60 seqs on a computer half as fast.
If I start several jobs on separate machines to upload sequences
to a common db (MySQL 4.0.16), the throughput scales perfectly so the RDBMS is not
the bottleneck.
I did the same test with biojava 1.3 + MySQL 3.23.54a (but with an older version
of BioSQL that matches biojava) and there the throughput matches the
benchmark (about 50 seqs per second).
If I do some profiling of a 10 seq genbank file with perl -d:Dprof load_seqdatabase.pl ...
The output from dprofpp tmon.out is:
%Time ExclSec CumulS #Calls sec/call Csec/c Name
50.2 1.084 1.084 150388 0.0000 0.0000 overload::mycan
12.2 0.264 1.411 4208 0.0001 0.0003 Carp::caller_info
4.22 0.091 1.502 210 0.0004 0.0072 Carp::ret_backtrace
3.85 0.083 1.624 3600 0.0000 0.0005 Bio::DB::BioSQL::BasePersistenceAd
aptor::_create_persistent
3.24 0.070 0.147 6174 0.0000 0.0000 Bio::DB::Persistent::PersistentObj
ect::AUTOLOAD
3.10 0.067 1.134 5643 0.0000 0.0002 overload::StrVal
3.10 0.067 0.067 11722 0.0000 0.0000 UNIVERSAL::isa
2.41 0.052 0.086 1081 0.0000 0.0001 Bio::DB::Persistent::PersistentObj
ect::can
2.22 0.048 0.661 151 0.0003 0.0044 Bio::Root::Root::_load_module
1.71 0.037 0.083 140 0.0003 0.0006 Bio::DB::BioSQL::SimpleValueAdapto
r::add_association
1.67 0.036 0.036 1845 0.0000 0.0000 Bio::Root::RootI::_rearrange
1.25 0.027 1.609 3191 0.0000 0.0005 Bio::DB::BioSQL::BasePersistenceAd
aptor::_process_child
1.20 0.026 1.772 1669 0.0000 0.0011 Bio::DB::BioSQL::BasePersistenceAd
aptor::create_persistent
1.20 0.026 0.026 2118 0.0000 0.0000 UNIVERSAL::can
1.16 0.025 0.033 1422 0.0000 0.0000 Bio::Root::Root::new
It seams like a lot of time is spent on creating objects. Is my system
wrongly configured or am I doing something else wrong?
Regards, Dennis
================================
Dennis Madsen, Ph.D.
Scientific Computing, Bioinformatics Group
Novo Nordisk Park, A2P
2760 Måløv
Denmark
================================
More information about the BioSQL-l
mailing list