[BioPython] blast parser slows down under python2.3
Peter Slickers
piet at clondiag.com
Fri Aug 29 13:24:42 EDT 2003
The biopython blast parser runs at only half of the speed
seen with python2.2 when executed with python2.3.
This effect is monitored best with a huge blast output file.
My setup for measuring the performance is quite simple.
I have used a small python script which just parses a blast
file and stores the content in memory. I have started this
script with the time command, and the python interpreter
was explicitely specified either as python2.2 or python2.3.
Each run was repeated four times.
--------------------------------------------------------------
command CPU time in sec
--------------------------------------------------------------
time python2.2 parser.py blastout.txt 5.11,3.58,3.98,4.15
time python2.3 parser.py blastout.txt 8.85,7.97,7.30,7.12
--------------------------------------------------------------
(with biopython 1.21)
I sticked into this when running the python profiler
on the blast parser. It turns out, that more
than half of the CPU time was spent in the warnings module,
which is part of the python standard installation
(/usr/local/lib/python2.3/warnings.py).
Further digging revealed that the function warn() is called
each time the readline() method from class UndoHandle is
executed (file site-packages/Bio/File.py).
Within the readline() method the python build-in function
apply() is heavily used. But since python2.3 the usage of
apply() is deprecated, and therefore the warn() function is called
by the interpreter each time the apply() function is used.
According to the python2.3 manual, the apply() function should be
substituted by the "extended call syntax" (which was introduced
in python2.0).
To test my hypothesis that the perfomance leck ist caused by
the apply() function, I took the standard genetical approach
of knock-out and complementing: I created a modified version
of Bio/File.py where all occurences of apply() were replaced
by "extended call syntax". After that, I run the benchmark again:
--------------------------------------------------------------
command CPU time in sec
--------------------------------------------------------------
time python2.2 parser.py blastout.txt 4.11,3.53,4.07,4.03
time python2.3 parser.py blastout.txt 4.94,4.96,4.54,5.24
--------------------------------------------------------------
(with modified Bio/File.py)
The numbers clearly reveal that my patch successfully reconstitutes
the speed of the blast parser under pythons2.3.
Fazit: the "newer, better, faster" dogma is not true with python.
Here is an example of what the patch looks like:
old: line = apply(self._handle.readline, args, keywds)
new: line = self._handle.readline(*args,**keywds)
--
Peter
-------------------------------------------------------------------
Peter Slickers piet at clondiag.com
Clondiag Chip Technologies http://www.clondiag.com/
Löbstedter Str. 105
07749 Jena
Germany
Fon: 03641/5947-65 Fax: 03641/5947-20
-------------------------------------------------------------------
More information about the BioPython
mailing list