[Biopython] Overhauling of Bio.PDB module
Patrick Kunzmann
padix.kleber at gmail.com
Wed Oct 16 16:37:44 UTC 2019
Hello Biopythoneers,
at the BOSC this year we talked about overhauling the Bio.PDB module.
The problem is that currently the atom coordinates are stored in a
separate NumPy array for each atom. This design prevents efficient
computation of all kinds of analyses (distances, angles,
superimpositions, etc.). One proposed possible solution to this problem,
we talked about, was to put the coordinates of the entire structure in
one NumPy array, and let the Atom, Residue, Chain and Structure objects
point to positions in this array. The benefit of this approach is that
functions could be directly applied onto the entire array, harnessing
the power of vectorization.
For the analysis we could adapt the vectorized functions from the Python
package Biotite, a project I am currently working on
(https://www.biotite-python.org/apidoc/biotite.structure.html). Usually,
these functions already accept the coordinates as NumPy array, so I
think only a few tweaks would be necessary for every function.
However, we would require one person or a small team who makes the
effort to implement the new structure types and adapts the analysis
functions. I could offer a pair of helping hands in the adaption of the
analysis functions, but I don't have the time for anything more.
So the question is: Is there anyone out there, who is willing to do this
work? Alternatively, I would propose to write a 'bridge' package between
Biopython and Biotite, that converts the Biopython structure
representation into the representation in Biotite and vice versa. I
think, this solution is less elegant but would also require less effort.
Best regards
Patrick Kunzmann
More information about the Biopython
mailing list