[Biopython] Key Error

Andrew Perry ajperry at pansapiens.com
Thu Apr 12 03:07:57 UTC 2012


Hi Mark,

The problem is arising since 1FAT is missing coordinates for residue 37 in
chain A,B,C and D. This is very common for protein structures in the PDB,
and can be for many reasons - it's often the case that the structural
biologist who determined the structure left out this residue since their
data didn't allow them to determine it's position with confidence.

By using range(num_atoms_to_do), you are assuming that there will be no
numbers missing in the sequence ... not the case ! [also, I think you
really mean range(num_of_residues_to_do) ].

The solution would be to do something like this:

from Bio.PDB.PDBParser import PDBParser
pdb_filename ='./1fat.pdb'
parser = PDBParser(PERMISSIVE=1)
structure = parser.get_structure("1fat", pdb_filename)
model = structure[0]
chain = model["A"]
for residue1 in chain:
  resnum = residue1.get_id()[1]
  atom1 = residue1['CA']

This loop will return residue objects in the Chain object, without caring
if there is a residue missing in the sequence.

(I can see how this could be confusing, since without looking at the
source, it seems the Bio.PDB.Chain.Chain object mostly behaves like
Python sequence
object (eg a list), but behaves like a dictionary when __getitem__ is
called on it via chain[some_key] . I'm sure there's some good reason for
that :) )

The next thing you may find it that you hit a non-amino acid ligand "NAG"
without a 'CA' atom. Use something like:

if not "CA" in residue1:
  continue

to catch that.

Also, just a pedantic note on terminology that may help in reading the docs
and further questions - "A", "B", "C" and "D" are chains in PDB
terminology. A "model" is something different (usually only found in NMR
structures with multiple models per PDB file).

Hope this helps,

Andrew Perry

Postdoctoral Fellow
Whisstock Lab
Department of Biochemistry and Molecular Biology
Monash University, Clayton Campus, PO Box 13d, VIC, 3800, Australia.
Mobile: +61 409 808 529


On Thu, Apr 12, 2012 at 11:57 AM, Mark Livingstone <
livingstonemark at gmail.com> wrote:

> Hi Guys,
>
> I am about 3 days into learning BioPython using the current EPD 32 bit
> Mac OS X academic distribution . When I run the included code, it
> works fine if I do
>
> num_atoms_to_do = 36
>
> but if I try to do any more, I get a key error.
>
> I am using the 1fat.pdb since that is what your tutes seem to use. The
> only thing that I note is that when comparing residues in model A & C,
> it is at residue 37 that the letters are no longer the same. However,
> since I only look at Model A in the first part of the code, I can't
> see that this should be a factor?
>
> Program output:
>
> <snipped>
>
> >From residue 0
> 0.000000 3.795453 5.663135 9.420295 12.296334 15.957790 19.201048
> 22.622383 25.621159 27.803837 27.483303 28.365652 28.663099 27.070955
> 23.441793 24.625151 26.016047 29.257225 31.299105 34.907970 36.100464
> 32.837784 33.310841 32.332653 35.280716 36.668713 35.319839 31.738102
> 31.607626 30.115669 33.220890 32.487370 36.199894 39.507755 42.668190
> 46.333019
> Traceback (most recent call last):
>  File "./inter_atom_distance.py", line 47, in <module>
>    residue2 = chain[y+1]
>  File
> "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/Bio/PDB/Chain.py",
> line 67, in __getitem__
>    return Entity.__getitem__(self, id)
>  File
> "/Library/Frameworks/Python.framework/Versions/7.2/lib/python2.7/site-packages/Bio/PDB/Entity.py",
> line 38, in __getitem__
>    return self.child_dict[id]
> KeyError: (' ', 37, ' ')
>
> <code>
>
>
> #! /usr/bin/env python
>
> # Initial idea for this code from
>
> http://stackoverflow.com/questions/6437391/how-to-get-distance-between-two-atoms-using-for-loop
>
> print "\n\n\033[95mMark's PDB Geometry experimentation code\033[0m\n\n"
> print "This code prints a set of c-alpha to c-alpha distances which
> are colourised so that distances < 5.0 are green\n"
> print "otherwise are colourised red. If you are using Microsoft
> Windows, you may need to load an ansi.sys driver in your
> config.sys\n\n"
>
> print "Any errors below in yellow are due to the .pdb file not being
> properly well formed.\n\n\033[93m"
>
> from Bio.PDB.PDBParser import PDBParser
> pdb_filename ='./1fat.pdb'
> parser = PDBParser(PERMISSIVE=1)
> structure = parser.get_structure("1fat", pdb_filename)
> model = structure[0]
> chain = model["A"]
> residue1 = chain[1]
> print "\033[0m"
> print residue1.get_resname() # SER
> residue2 = chain[2]
> print residue2.get_resname() # ASN
>
> atom1 = residue1['CA']
> print atom1.get_name()
> print atom1.get_coord()
> atom2 = residue2['CA']
> print atom2.get_name()
> print atom2.get_coord()
> distance = atom1-atom2 # minus is overloaded to do 3D vector math - how
> clever!!
> print"%s to %s euclidean distance = %f Angstroms"  %
> (residue1.get_resname(), residue2.get_resname(), distance)
> print "%d models in pdb file named %s" % (len(model), pdb_filename) #
> 4 models in pdb
> print "%d residues in model 1" % len(chain) # 239 residues
> print "%s has %d atoms" % (residue1.get_resname(), len(residue1)) #
> SER has 6 atoms
>
> print "Length of Model 'A' is %d and Model 'C' is %d" %
> (len(model['A']), len(model['C']))
>
> print ("\n\033[93mDistances between C-alpha atoms of residues in model 1
> \n")
>
> num_atoms_to_do = 37
>
> for x in range(num_atoms_to_do):
>    print "\033[0mFrom residue %d" % x
>    residue1 = chain[x+1]
>    atom1 = residue1['CA']
>
>    for y in range(num_atoms_to_do):
>        residue2 = chain[y+1]
>        atom2 = residue2['CA']
>        distance = (atom1 - atom2)
>        if distance < 5.0:
>            print("\033[92m%f" % distance),
>        else:
>            print("\033[91m%f" % distance),
>    print "\n"
>
> print "\n\033[93mDistances between C-alpha atoms of residues in model
> 1 to model 3 \n"
> print "NB: These have NOT been superimposed - thus the large distances
> between matched atoms\033[0m\n"
>
>
> num_atoms_to_do = 37
>
> for x in range(num_atoms_to_do):
>    print "\033[0mFrom residue %d" % x
>    model = structure[0]
>    chain = model["A"]
>    residue1 = chain[x+1]
>    atom1 = residue1['CA']
>
>    for y in range(num_atoms_to_do):
>        model = structure[0]
>        chain = model["C"]
>        residue2 = chain[y+1]
>        atom2 = residue2['CA']
>
>        distance = (atom1 - atom2)
>        if distance < 5.0:
>            print("\033[92m%f" % distance),
>        else:
>            print("\033[91m%f" % distance),
>    print "\n"
>
>
> print "\033[0m"
>
> </code>
>
> Thanks in advance,
>
> MarkL
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>



More information about the Biopython mailing list