[Biopython] Problem with pdb-file parsing

Tue Sep 8 17:45:53 UTC 2009

Hi,

I don't know whether this is either a bug or I did something wrong. I am
parsing the pdb structure 1a2d with the following code to get the
one-letter polypeptide sequence for chain A:

------------------CODE----------------
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import *

parser = PDBParser()
ppb = PPBuilder()
structure = parser.get_structure('tmp', '1a2d.pdb')
polypeptide = ppb.build_peptides(structure[0]['A'])
sequence = str(polypeptide[0].get_sequence())

print sequence
------------------CODE----------------

This however gives me a sequence that is one aminoacid shorter than
expected. The structure contains one HETATM block within the ATOM block
of chain A (pos 117), which gets translated into a 'X' in the sequence.
The following aminoacid at position 118 (VAL) seems to be missing.

So the resulting sequence around the X is:
...VEXMK...
To my understanding this should be:
...VEXVMK...

Is this behaviour intended? Is it a bug? The biopython version is 1.49
(Ubuntu jaunty).

Chris