[Biopython-dev] RMSD calculation

Kristian Rother krother at rubor.de
Tue Nov 2 11:15:05 UTC 2010


Hi Greg,

I think I can help to clear up the RMSD question.
(or RMS however you abbreviate it its the same formula)

The short answer is, the methods giving lower RMSD do something
conceptually very different from Bio.PDB.

Long answer:

- Bio.PDB.Superimposer does structure *superposition*. It takes pairs of
atoms, and finds the rotation/translation matrix that minimizes the RMSD.
There is a single analytical solution to this, returned by the Kabsch
algorithm from 1976 (see http://www.pymolwiki.org/index.php/Kabsch). I'm
quite sure Biopython/SVDSuperimposer implements this algorithm.

- Services like the EBI SSM server do *structure alignment*. They take two
structures and try to find a set of residue pairs that fit to each other
well. To do so, they occasionally calculate RMSDs, but do not necessarily
use all the residues provided.

For instance, when submitting protein1 and protein2 to EBI, the output
tells me that

N(algn) = 31

meaning that 31 of the 36 residues were used to calculate the alignment.
When looking at the structures, these are probably on the N-terminus (see
picture).

==> the structure alignment algorithm discards the residues he doesnt
regard useful for aligning, this is why the RMSD is lower.


Do you think this explains all our observations?

Best regards,
    Kristian





> Hello everyone,
> I tried with pymol and it gives a value of 1.792 for the RMSD after
> alignment
> The EU bioinformatics server gives a value of 1.74
> VMD 1.62
> But SVD and PDB Superimposer gives a value 3.2
> I have attached the 2 PDB files concerned-is it something I am doing in
> calculating the RMSD using biopython?
> Thank you
>
> On Thu, Oct 28, 2010 at 1:46 PM, Peter
> <biopython at maubp.freeserve.co.uk>wrote:
>
>> On Thu, Oct 28, 2010 at 6:14 PM, George Devaniranjan
>> <devaniranjan at gmail.com> wrote:
>> > Yes there is a difference-for 2 proteins having exact same residues of
>> 36
>> > residues the values from 4 sources are as follows
>> > VMD RMSD=1.61
>> > SVD RMSD =3.2
>> > PDB RMSD=3.2
>> >
>> > From the EU Bioinformatics server (link below) RMSD =1.75
>> > (http://www.ebi.ac.uk/msd-srv/ssm/cgi-bin/ssmserver)
>> >
>> > So Biopython really is computing the RMSD and not RMS?
>> > Thanks you
>>
>> It has been a while since I looked at this (but I can still edit
>> the Warwick page if is is unclear).
>>
>> Which definition of RMSD are you using?
>>
>> Bio.PDB uses Bio.SVDSuperimposer, so they should be the same.
>> The comment for this code *says* is calculates the RMS deviation,
>> here:
>>
>>        diff=coords1-coords2
>>        l=coords1.shape[0]
>>        return sqrt(sum(sum(diff*diff))/l)
>>
>> Here variable l will be the number of atoms.
>>
>> What are the two examples you are using? Can you at perhaps
>> share a small example pair of PDB files?
>>
>> Peter
>>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: superpos.png
Type: image/png
Size: 172427 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20101102/f02741f3/attachment-0002.png>


More information about the Biopython-dev mailing list