[Bioperl-l] Comparing DB_FILE and SDBM

Thu Aug 12 17:10:48 EDT 2004

On Thu 08/12/04 13:44, Mike Muratet wrote:
> Greetings
> 
> I did a comparison myself of Bio::Index::GenBank between DB_FILE and SDBM
> on the latest version of the files from the Genbank primate division using
> a Compaq with 376K of memory and a 2.4GHz Pentium 4 Xeon. I used the

Wow, now thats a machine in desprate need of a memory upgrade.

> environment variable to control the indexer. I got the latest release of
> Berkeley from SleepyCat. 
> 
> Using DB_FILE
> 
> real    38m32.751s
> user    6m36.070s
> sys     1m16.650s
> 
> Using SDBM
> 
> real    46m13.856s
> user    6m34.400s
> sys     1m15.010s

How loaded was that machine? (I'm assuming just the tests.)

> 
> A negligible difference. Has anyone tried to compare the libraries (or
> knows where someone has?)

I never compared the DB libraries, but I did some comparison between
regexp and XML::SAX for tigr.pm, and found that using the XML::SAX
module on top of Expat was horribly slow. The start to finish times for
regexp were <5m while XML::SAX was >25m, even though the regexp were
parsing considerably more data out at the time. And regexp took 4-5m of
CPU time, XML::SAX would do 5-8m meaning that XML::SAX would idle for
quite a while during its run.

I have a feeling that perl yeilds its timeslice for whatever reason when
switching from a library to perl-code, making libraries that do a lot of
function calls (such as XML parser and DBs) very slow in terms of
latency.

Can anyone confirm or deny this?

Thanks,

-- 

------------------------------------------------------
| Josh Lauricha            | Ford, you're turning    |
| laurichj at bioinfo.ucr.edu | into a penguin. Stop    |
| Bioinformatics, UCR      | it                      |
|----------------------------------------------------|
| OpenPG:                                            |
|  4E7D 0FC0 DB6C E91D 4D7B C7F3 9BE9 8740 E4DC 6184 |
|----------------------------------------------------|