[Bioperl-l] bioperl-db performance

Mon Sep 6 01:30:00 EDT 2004

Hi,

I have a project which is based on the bioperl-db. Till now I've been 
using the old (bioperl-1.1 branch) version of the code and schema, but 
it is becoming unacceptable (mainly because of the way taxonomy is 
stored), so I decided to upgrade to the current version.  The new code 
is a huge leap forward in terms of design, clarity and consistency. 
However, I am experiencing severe performance problems.

For example, retrieving a locally stored GenPept entry consistently 
takes 16-17 seconds (by primary or unique key, doesn't matter), 
compared 2-3'' it takes to get it directly from SRS using 
Bio::DB::GenBank or ~ 1'' from the old bioperl-db. Also, getting a 
species object (I use them a lot) from a local database (new 
bioperl-db) that contains nothing but an import of NCBI taxonomy takes 
 >15'', compared to <1'' with the old bioperl-db. In both cases I use a 
mysql 4.0.16 on a dual 866 Mhz PowerPC G4 with 768 Mb RAM.

So, my questions are:
1. Is this performance drop an expected behavior (due to increased 
complexity of the code and new schema)?
2. If the answer to (1) is yes, then what is the way to improve it and 
how big an improvement can be achieved?
3. If the answer to (1) is no, where should I look for my problem 
source?

There was a related question on this list in May 2004, but it described 
sequence loading performance on a significantly slower machine, and the 
suggestion was to increase the  horsepower.

Thanks in advance!

Regards,
Alex.