[Biopython] Overhauling of Bio.PDB module

Wed Oct 16 16:37:44 UTC 2019

Hello Biopythoneers,

at the BOSC this year we talked about overhauling the Bio.PDB module. 
The problem is that currently the atom coordinates are stored in a 
separate NumPy array for each atom. This design prevents efficient 
computation of all kinds of analyses (distances, angles, 
superimpositions, etc.). One proposed possible solution to this problem, 
we talked about, was to put the coordinates of the entire structure in 
one NumPy array, and let the Atom, Residue, Chain and Structure objects 
point to positions in this array. The benefit of this approach is that 
functions could be directly applied onto the entire array, harnessing 
the power of vectorization.

For the analysis we could adapt the vectorized functions from the Python 
package Biotite, a project I am currently working on 
(https://www.biotite-python.org/apidoc/biotite.structure.html). Usually, 
these functions already accept the coordinates as NumPy array, so I 
think only a few tweaks would be necessary for every function.

However, we would require one person or a small team who makes the 
effort to implement the new structure types and adapts the analysis 
functions. I could offer a pair of helping hands in the adaption of the 
analysis functions, but I don't have the time for anything more.

So the question is: Is there anyone out there, who is willing to do this 
work? Alternatively, I would propose to write a 'bridge' package between 
Biopython and Biotite, that converts the Biopython structure 
representation into the representation in Biotite and vice versa. I 
think, this solution is less elegant but would also require less effort.

Best regards

Patrick Kunzmann