[Biopython] Reading PDB files containing multiple copies of the same molecule

Fri Nov 1 22:21:12 UTC 2019

Hi all,

Apologies if there’s an easy solution to this but a quick google didn’t turn up anything!

I’m trying to use Bio.PDB.PDBParser.get_structure() to read a pdb file from a collaborator. The file contains multiple copies of the a few different molecules, differentiated by the SEGID entry in columns 73:76 of the file. 

When trying to read this file I get the following error once for each atom in a chain which was already defined:
> /Users/alisterburt/anaconda/envs/py37/lib/python3.7/site-packages/Bio/PDB/PDBParser.py:291: PDBConstructionWarning: PDBConstructionException: ('H_POP', 26, ' ') defined twice at line 76812.
> Exception ignored.
> Some atoms or residues may be missing in the data structure.
>   % message, PDBConstructionWarning)

This means the resulting Structure object only contains one copy of each molecule.

I know this SEGID entry is not part of the official PDB format, does anyone have a quick solution that will allow me to read in all atoms from this file?

Thanks in advance,

Alister