[Bioperl-l] bioperl-db performance: load_seqdatabase.pl throughput
speed
Henry R Bigelow
hrb46 at columbia.edu
Tue May 11 13:27:56 EDT 2004
Hi,
my name is Henry Bigelow and I recently installed bioperl-1.4,
bioperl-db, dbi and dbd-mysql, mysql-4.0 (with InnoDB enabled),
biosql-schema, and instantiated biosqldb-mysql.sql. i've successfully
loaded some sequences of release43.dat, the swissprot flat file, but the
throughput is roughly 1 sequence every 5 to 10 seconds, on a (admittedly
slow) 400 Mhz 2 CPU Pentium III with 256 Mb memory. I ran the command:
perl load_seqdatabase.pl --host localhost --dbname bioseqdb --namespace
swissprot --dbuser bigelow --dbpass XXX --driver mysql --format swiss
/data/swissprot/release43.dat
I also ran it (on a set of 15 swissprot entries) with a profiler:
perl -d:DProf load_seqdatabase.pl ...
then with
dprofpp -u
i got this:
%Time ExclSec CumulS #Calls sec/call Csec/c Name
9.62 0.800 0.985 15282 0.0001 0.0001
Bio::DB::Persistent::PersistentObject::isa
9.54 0.793 1.403 11909 0.0001 0.0001
Bio::DB::Persistent::PersistentObject::AUTOLOAD
9.25 0.769 3.152 8888 0.0001 0.0004
Bio::DB::BioSQL::BasePersistenceAdaptor::_create_persistent
4.69 0.390 2.922 7733 0.0001 0.0004
Bio::DB::BioSQL::BasePersistentAdaptor::_process_child
4.59 0.382 0.382 26865 0.0000 0.0000
Bio::DB::Persistent::PersistentObject::obj
3.84 0.319 0.319 32822 0.0000 0.0000 UNIVERSAL::isa
3.69 0.307 0.372 86 0.0036 0.0043
Bio::DB::BioSQL::ReferenceAdaptor::_crc64
3.28 0.273 1.195 258 0.0011 0.0046 Bio::Root::Root::_load_module
2.80 0.233 3.545 5465 0.0000 0.0006
Bio::DB::BioSQL::BasePersistenceAdaptor::create_persistent
2.74 0.228 0.228 291 0.0008 0.0008 Bio::Root::RootI::stack_trace
1.92 0.160 0.160 1794 0.0001 0.0001 DBI::st::execute
1.84 0.153 0.534 1608 0.0001 0.0003
Bio::DB::Persistent::PersistentObject::new
1.80 0.150 0.150 7215 0.0000 0.0000
Bio::DB::Persistent::PersistentObject::primary_key
1.74 0.145 0.185 2640 0.0001 0.0001 Bio::Root::Root::new
1.71 0.142 1.078 474 0.0003 0.0023
Bio::DB::BioSQL::BaseDriver::insert_object
i do realize that these perl objects are large, but it still seems quite
slow. (i'm not even sure whether the profiler demonstrates that the
majority of time is spent instantiating perl objects as opposed to running
mysql commands.)
all bioperl-db, bioperl, dbi and dbd-mysql tests came out ok (the vast
majority of them anyway).
incidentally, it took me a week of getting errors during
load_seqdatabase.pl loading, before i discovered the true cause: that
a perl executable with threading enabled does NOT work with this. (The
author of dbd-mysql or dbi warns about this, but i didn't heed the warning
at first).
if anyone has any ideas about what might be making it slow, please let me
know! i'd greatly appreciate it.
Sincerely,
Henry Bigelow
More information about the Bioperl-l
mailing list