[Biopython-dev] Benchmarking PDBParser

Fri May 13 02:26:42 UTC 2011

On Thu, May 12, 2011 at 9:59 AM, João Rodrigues <anaryin at gmail.com> wrote:

> First results: http://www.biopython.org/wiki/PDBParser
>
> Comments?
>

Cool. So the atom_element additions did slow the parser down noticeably. The
warnings may have caused some tiny slowdown, presumably when handling PDB
files with inconsistencies, but I personally am not concerned about that.

I think atom element assignment could be sped up in either of two ways:
(a) Try to optimize Atom._assign_element for speed, somehow
(b) Store only the atom field as a string during parsing. Change
Atom.element and Atom.mass to be properties that parse the atom field to
determine the element type on demand (i.e. self._get_element checks if
self._element exists yet; if not, parse the string and set self._element;
self._get_mass is basically identical to _assign_atom_mass).

The lazy loading approach (b) would be faster if you're not using the
element/mass values at all, but probably a little slower if you need those
values from every atom in a structure.

-E