[Bioperl-l] load_seqdatabase.pl running SLOW!

Tue Jan 25 18:15:52 EST 2005

Hilmar (or others)-

I've set up a biosql based database using PostgreSQL 7.2 on a PC with an 
Intel Pentium 4 3.0 GHz processor, 800 MHz system Bus.  1 GB of RAM, and 
Linux (2.2 kernel - Debian woody distro).  Onto that I am loading 
~352,000 sequences from RefSeq complete rna collection using 
load_seqdatabase.pl.  It's running kind of slow - loding on average 
about 1 sequence every 2-5 seconds.  In the archives I've read your 
comments to a previous question like this suggesting two fast 
processors, a couple gigs of memory and 2-3 drives to really make things 
fly and while my system isn't that good, it seems like I should be doing 
better.  I got to experimenting on another (slower) system while waiting 
for things to load, and found that running the same script to load the 
same file goes about 3X faster on a 266MHz Intel processor with 192 Mb 
RAM.  Same installation of PostgreSQL (both installed from deb package 
with defaults), and same installation of Debian Linux (except that the 
kernel on the older slow machine has been updated to 2.4)  Another 
difference I noticed between the two is that the old 266 MHz machine is 
using about 75% CPU resources for perl and about 25% for postmaster 
whereas the faster 3 GHz machine (but slower running 
load_seqdatabase.pl) is using 95% of it's CPU resources for postmaster 
and about 3% for perl.  Both systems are using up most of their memory, 
but little to no swap.  Could the kernel upgrade really be making the 
difference?  Any thoughts?  As it's going now I can wait over a week for 
all these sequences to load, or build the database on our dinosaur 
server in a couple of days and dump it across to our sexy new 3 GHz 
server.  Talk about bass ackwards!

Barry

-- 
Barry Moore
Dept. of Human Genetics
University of Utah
Salt Lake City, UT