[Biopython-dev] [Biopython (old issues only) - Bug #3379] (Closed) PDBParser fails to parse PDBs produced by PatchDock

redmine at redmine.open-bio.org redmine at redmine.open-bio.org
Sun Jul 24 01:21:33 UTC 2016


Issue #3379 has been updated by Travis Wrightsman.

Status changed from New to Closed
% Done changed from 0 to 100

Pull request closed.

----------------------------------------
Bug #3379: PDBParser fails to parse PDBs produced by PatchDock
https://redmine.open-bio.org/issues/3379#change-15314

* Author: David Cain
* Status: Closed
* Priority: Low
* Assignee: Biopython Dev Mailing List
* Category: Main Distribution
* Target version: 1.57
* URL: 
----------------------------------------
I apoligize in advance if this technically doesn't count as a bug, as the problem is arising out of improperly formatted PDBs.


h3. Background

Protein docking utilities can generally create a complex PDB from two input files. Depending on the rotation algorithm, at least one of the PDB files is rotated (its ATOM coordinates modified in-place), then the two files are concatenated to create a protein complex file.

h3. Why PDBParser fails

Utilities like ZDOCK strip a lot of data from the input files, creating a poorly-formed PDB file that raises PDBConstructionWarnings, but PDBParser can ultimately parse. PatchDock, however, preserves the input PDB files as they were- the only thing that changes is ATOM coordinates. This is problematic when the receptor PDB has an @END@ record or @CONECT@ records: PDBParser's current behavior is to consider anything after an @END@ or @CONECT@ to be trailer data, and cease parsing when they're encountered. This means that many complexes parse cleanly, but completely exclude the ligand.

h3. How to fix the problem

Now, in an ideal world- the responsibility would be on the creators of the docking utilities to create well-formed complex PDB files. However, this quick concatenation seems to be pretty common (complexes are often created by very short, hackish Perl scripts). Should PDBParser be able to parse these badly formed PDB files?

h3. Potential change to @PDBParser._parse_coordinates@?

If a modification to PDBParser is on the table, my thought would be to still consider anything after @END@ or @CONECT@ to be part of the trailer, but make an attempt to parse extra coordinate data from this trailer before returning (probably through a recursive call). If records are found in the trailer, a PDBConstructionWarning is raised, but they're added to the structure.

If this approach is reasonable, let me know and I'd be happy to mock something up and push it to my branch on GitHub. Otherwise, I'll just write scripts to clean ugly complexes for parsing.

My only thought is that most users of docking software are probably not able or willing to write such a script, and thus can't use BioPython to parse the PDB output.

---Files--------------------------------
complex.1.pdb (380 KB)


-- 
You have received this notification because you have either subscribed to it, or are involved in it.
To change your notification preferences, please click here and login: http://redmine.open-bio.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython-dev/attachments/20160724/4425e5e1/attachment.html>


More information about the Biopython-dev mailing list