[Biopython] Reading PDB files containing multiple copies of the same molecule

João Rodrigues j.p.g.l.m.rodrigues at gmail.com
Fri Nov 1 22:27:11 UTC 2019


Hi Alister,

The Biopython parser identifies unique residues based on chain ids. For a
quick solution, you can use the pdb_segxchain tool from
https://pypi.org/project/pdb-tools/ to swap the segid to the chain id
field. Then re-read using bio.pdb

Cheers,

João

A sexta, 1/11/2019, 15:21, Alister Burt <alisterburt at gmail.com> escreveu:

> Hi all,
>
> Apologies if there’s an easy solution to this but a quick google didn’t
> turn up anything!
>
> I’m trying to use Bio.PDB.PDBParser.get_structure() to read a pdb file
> from a collaborator. The file contains multiple copies of the a few
> different molecules, differentiated by the SEGID entry in columns 73:76 of
> the file.
>
> When trying to read this file I get the following error once for each atom
> in a chain which was already defined:
> >
> /Users/alisterburt/anaconda/envs/py37/lib/python3.7/site-packages/Bio/PDB/PDBParser.py:291:
> PDBConstructionWarning: PDBConstructionException: ('H_POP', 26, ' ')
> defined twice at line 76812.
> > Exception ignored.
> > Some atoms or residues may be missing in the data structure.
> >   % message, PDBConstructionWarning)
>
> This means the resulting Structure object only contains one copy of each
> molecule.
>
> I know this SEGID entry is not part of the official PDB format, does
> anyone have a quick solution that will allow me to read in all atoms from
> this file?
>
> Thanks in advance,
>
> Alister
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> https://mailman.open-bio.org/mailman/listinfo/biopython
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20191101/11725fce/attachment.htm>


More information about the Biopython mailing list