[Bioperl-l] bioperl-db performance

Tue Sep 7 00:58:24 EDT 2004

Hi again,

Ok, I found one bottleneck which if removed significantly speeds up 
species (and hence sequence) object retrieval from bioperl-db, at least 
in my case (mysql 4.0.20). For the reasons I don't understand, SQL 
statement in the 
Bio::DB::BioSQL::mysql::SpeciesAdaptorDriver::get_classification works 
about 10x faster if " ORDER BY node.left_value" is removed from it and 
sorting of the classification array is done with perl (left_value has 
to be included into the list of returned fields).  An even higher 
speedup (from >16'' to <1'') can be achieved by replacing the complex 
request with a dumb perl loop that fetches parent nodes one by one with 
a simple select by primary key.

I didn't check whether this behavior is specific to mysql or the 
particular versions of it that I have (4.0.16 and 4.0.20), but since 
mysql is popular and 4.0.20 is the current production version, I think 
it's better to fix this.

Regards,
Alex.
On 06/09/2004, at 3:30 PM, Alex Zelensky wrote:

> Hi,
>
> I have a project which is based on the bioperl-db. Till now I've been 
> using the old (bioperl-1.1 branch) version of the code and schema, but 
> it is becoming unacceptable (mainly because of the way taxonomy is 
> stored), so I decided to upgrade to the current version.  The new code 
> is a huge leap forward in terms of design, clarity and consistency. 
> However, I am experiencing severe performance problems.
>
> For example, retrieving a locally stored GenPept entry consistently 
> takes 16-17 seconds (by primary or unique key, doesn't matter), 
> compared 2-3'' it takes to get it directly from SRS using 
> Bio::DB::GenBank or ~ 1'' from the old bioperl-db. Also, getting a 
> species object (I use them a lot) from a local database (new 
> bioperl-db) that contains nothing but an import of NCBI taxonomy takes 
> >15'', compared to <1'' with the old bioperl-db. In both cases I use a 
> mysql 4.0.16 on a dual 866 Mhz PowerPC G4 with 768 Mb RAM.
>
> So, my questions are:
> 1. Is this performance drop an expected behavior (due to increased 
> complexity of the code and new schema)?
> 2. If the answer to (1) is yes, then what is the way to improve it and 
> how big an improvement can be achieved?
> 3. If the answer to (1) is no, where should I look for my problem 
> source?
>
> There was a related question on this list in May 2004, but it 
> described sequence loading performance on a significantly slower 
> machine, and the suggestion was to increase the  horsepower.
>
> Thanks in advance!
>
> Regards,
> Alex.
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>
>