[Biopython-dev] RMSD calculation

George Devaniranjan devaniranjan at gmail.com
Fri Oct 29 13:33:36 UTC 2010


Hello Peter,
Thanks for the answer-I also got the same values  using biopython both SVD
and PDB modules (3.2)
My concern arose as a result of VMD  + pymol + EU Bioinfomatics sever giving
a value of 1.7 (well  1.6 for VMD)
but biopython giving 3.2.
Even if the two groups calculate RMSD differently (that is : what atoms the
consider only CA or backbone or all atoms ) there is no way there can be
such a big discrepency between biopyhton and VMD/Pymol/EUServer
I tried RMSD calculation with VMD
WITHOUT alignment and for 14.02 as the answer.

For biopython PDB  module only CA gives 3.2 while backbone gives 3.1 which
is acceptable.
Thanks once again, let me know if you have any further thoughts on this.


On Fri, Oct 29, 2010 at 12:37 PM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Thu, Oct 28, 2010 at 9:42 PM, George Devaniranjan
> <devaniranjan at gmail.com> wrote:
> > Hello everyone,
> > I tried with pymol and it gives a value of 1.792 for the RMSD after
> > alignment
> > The EU bioinformatics server gives a value of 1.74
> > VMD 1.62
> > But SVD and PDB Superimposer gives a value 3.2
> > I have attached the 2 PDB files concerned-is it something I am doing in
> > calculating the RMSD using biopython?
> > Thank you
>
> Are you doing the same comparison in each case? For example,
> are some doing only C-alpha atoms, while others use all atoms?
>
> Thanks for the example files - but you forgot the sample Python code ;)
>
> import Bio.PDB
> import numpy
> structure1 = Bio.PDB.PDBParser().get_structure("one", "protein1.pdb")
> structure2 = Bio.PDB.PDBParser().get_structure("two", "protein2.pdb")
> super_imposer = Bio.PDB.Superimposer()
> super_imposer.set_atoms(list(structure1.get_atoms()),
> list(structure2.get_atoms()))
> print super_imposer.rms
>
> This gives 3.19274942026 for all atoms, as you said. Using
> Bio.SVSuperimposer,
>
> coord1 = np.array([atom.coord for atom in structure1.get_atoms()])
> coord2 = np.array([atom.coord for atom in structure2.get_atoms()])
> from Bio.SVDSuperimposer import SVDSuperimposer
> sup=SVDSuperimposer()
> sup.set(coord1, coord2)
> print sup.run()
> print sup.get_rms()
>
> Again, 3.19274953249 after moving the atoms. Alternatively if we
> bypass the alignment step and calculate the RMS of the unaligned
> structures we get a much higher RMS:
>
> sup=SVDSuperimposer()
> print sup._rms(coord1, coord2) #private method, don't use normally
> 14.8866750536
>
> This matches what I get by doing it explicitly via numpy:
>
> import numpy as np
> coord1 = np.array([atom.coord for atom in structure1.get_atoms()])
> coord2 = np.array([atom.coord for atom in structure2.get_atoms()])
> assert coord1.shape == coord2.shape
> diff = coord1-coord2
> l = len(diff) #number of atoms
> from math import sqrt
> print sqrt(sum(sum(diff*diff))/l)
> print np.sqrt(np.sum(diff**2)/l) #should give same result as above line
>
> (This should be the same calculation as Bio.PDB.Superimposer uses)
>
> So, I think there are two potential sources of the disagreement
> (1) The alignment, and (2) the RMS calculation. Can you use
> the other tools to get the RMS without aligning the structures?
> Alternatively, can you save their aligned structures and give
> that to Biopython?
>
> Peter
>
> P.S. Why doesn't file protein2.pdb have the elements column?
>



More information about the Biopython-dev mailing list