[Biopython-dev] why HETERO-flag in residue identifier (Bio.PDB.Residue)?

Peter biopython at maubp.freeserve.co.uk
Mon Jan 24 23:08:59 UTC 2011

On Mon, Jan 24, 2011 at 6:25 PM, Hongbo Zhu <macrozhu at gmail.com> wrote:
> Hi,
> I was recently working on the BioPython module DSSP.py . There was some
> problem in the module when reading DSSP output. One of them was due to
> different descriptions of residue identifier in DSSP and BioPython. As we
> all know, in BioPython, residue identifier consists of three fields (
> hetero-flag, sequence identifier, insertion code ). ...
> I somehow got interested in the issue and performed a scanning on a subset
> of PDB (a non-redundant set of ~22,000 pdb entries derived using PISCES
> http://dunbrack.fccc.edu/PISCES.php ). I found ~30 cases in which same
> sequence identifier + icode is used for more than one residues (see below).
> I checked all of them. It turned out that in all of these cases, though same
> sequence identifier+icode is used for different residues, the residues have
> different alternative locations. This means they can still be distinguished
> if alternative locations are considered. In BioPython, alternative location
> is always very well taken care of.
> So it seems to me that hetero-flag is a bit redundant in residue identifier.
> It should also be fine if hetero-flag is just given as an attribute to
> residues  (I still need to scan all the PDB entries to confirm my claim). I
> want to hear your opinions about the hetero-flag in residue identifier.

It may be that prior to the big PDB re-mediation (clean up) this was a
real and common problem. Certainly your investigation suggests
this isn't the case now.


More information about the Biopython-dev mailing list