[Biopython-dev] RMSD calculation

Peter biopython at maubp.freeserve.co.uk
Fri Oct 29 11:37:08 UTC 2010

```On Thu, Oct 28, 2010 at 9:42 PM, George Devaniranjan
<devaniranjan at gmail.com> wrote:
> Hello everyone,
> I tried with pymol and it gives a value of 1.792 for the RMSD after
> alignment
> The EU bioinformatics server gives a value of 1.74
> VMD 1.62
> But SVD and PDB Superimposer gives a value 3.2
> I have attached the 2 PDB files concerned-is it something I am doing in
> calculating the RMSD using biopython?
> Thank you

Are you doing the same comparison in each case? For example,
are some doing only C-alpha atoms, while others use all atoms?

Thanks for the example files - but you forgot the sample Python code ;)

import Bio.PDB
import numpy
structure1 = Bio.PDB.PDBParser().get_structure("one", "protein1.pdb")
structure2 = Bio.PDB.PDBParser().get_structure("two", "protein2.pdb")
super_imposer = Bio.PDB.Superimposer()
super_imposer.set_atoms(list(structure1.get_atoms()),
list(structure2.get_atoms()))
print super_imposer.rms

This gives 3.19274942026 for all atoms, as you said. Using Bio.SVSuperimposer,

coord1 = np.array([atom.coord for atom in structure1.get_atoms()])
coord2 = np.array([atom.coord for atom in structure2.get_atoms()])
from Bio.SVDSuperimposer import SVDSuperimposer
sup=SVDSuperimposer()
sup.set(coord1, coord2)
print sup.run()
print sup.get_rms()

Again, 3.19274953249 after moving the atoms. Alternatively if we
bypass the alignment step and calculate the RMS of the unaligned
structures we get a much higher RMS:

sup=SVDSuperimposer()
print sup._rms(coord1, coord2) #private method, don't use normally
14.8866750536

This matches what I get by doing it explicitly via numpy:

import numpy as np
coord1 = np.array([atom.coord for atom in structure1.get_atoms()])
coord2 = np.array([atom.coord for atom in structure2.get_atoms()])
assert coord1.shape == coord2.shape
diff = coord1-coord2
l = len(diff) #number of atoms
from math import sqrt
print sqrt(sum(sum(diff*diff))/l)
print np.sqrt(np.sum(diff**2)/l) #should give same result as above line

(This should be the same calculation as Bio.PDB.Superimposer uses)

So, I think there are two potential sources of the disagreement
(1) The alignment, and (2) the RMS calculation. Can you use
the other tools to get the RMS without aligning the structures?
Alternatively, can you save their aligned structures and give
that to Biopython?

Peter

P.S. Why doesn't file protein2.pdb have the elements column?

```