[Biopython-dev] why HETERO-flag in residue identifier (Bio.PDB.Residue)?

Hongbo Zhu macrozhu at gmail.com
Tue Jan 25 08:17:13 UTC 2011

By redundant, I mean that a residue can be unambiguously determined by using
(PDB code, model id, chain id, residue sequence identifier+icode) .
HETERO-flag itself is definitely not redundant information for a residue.
But it seems to be redundant in residue ID according to the small test on
~22,000 remediated PDB files.

This redundancy sometimes causes unnecessary problems. For example, in DSSP,
residues are determined by using sequence identifier+icode. When parsing
DSSP output, some residues cannot be located the PDB structure stored in
Bio.PDB.Structure because sequence identifier + icode is not enough for
determining the residues in BioPython. One example is:
3jui 0 A 547
In the protein structure, using sequence identifier + icode, this residue is
unambiguously determined. But in BioPython, one has to specify ('H_MSE',
547, ' ') to locate this residue. (Note that we can also simply use 547
without icode to locate it. But we don't want to accidentally forget icode
in our script, do we :).

Peter pointed out that the existence of hetero-flag in residue ID might be
due to the mistakes in the old PDB files before remediation. If it is the
case, hetero-flag should better be retained for backwards compatibility.


On Tue, Jan 25, 2011 at 12:23 AM, João Rodrigues <anaryin at gmail.com> wrote:

> To be really honest, I don't understand the problem with the flag. I don't
> really see it as redundant. Could you please explain better?

More information about the Biopython-dev mailing list