[Biopython-dev] why HETERO-flag in residue identifier (Bio.PDB.Residue)?

Hongbo Zhu macrozhu at gmail.com
Mon Jan 24 18:25:17 UTC 2011


Hi,

I was recently working on the BioPython module DSSP.py . There was some
problem in the module when reading DSSP output. One of them was due to
different descriptions of residue identifier in DSSP and BioPython. As we
all know, in BioPython, residue identifier consists of three fields (
hetero-flag, sequence identifier, insertion code ). But DSSP uses only the
latter two. This can sometimes cause unnecessary exceptions (see
http://bugzilla.open-bio.org/show_bug.cgi?id=3166 ).

In retrospect, I start to wonder why BioPython included hetero-flag in
residue identifier. After checking several BioPython documents, I found that
in "The Biopython Structural Bioinformatics FAQ", this question has been
answered: "The reason for the hetero-flag is that many, many PDB files use the
same sequence
identifier for an amino acid and a hetero-residue or a water, which would
create obvious problems if the hetero-flag was not used."

I somehow got interested in the issue and performed a scanning on a subset
of PDB (a non-redundant set of ~22,000 pdb entries derived using PISCES
http://dunbrack.fccc.edu/PISCES.php ). I found ~30 cases in which same
sequence identifier + icode is used for more than one residues (see below).
I checked all of them. It turned out that in all of these cases, though same
sequence identifier+icode is used for different residues, the residues have
different alternative locations. This means they can still be distinguished
if alternative locations are considered. In BioPython, alternative location
is always very well taken care of.

So it seems to me that hetero-flag is a bit redundant in residue identifier.
It should also be fine if hetero-flag is just given as an attribute to
residues  (I still need to scan all the PDB entries to confirm my claim). I
want to hear your opinions about the hetero-flag in residue identifier.

cheers,
hongbo zhu

Duplicate: 2pxs  0 A ('H_XYG', 66, ' ')
Duplicate: 2pxs  0 B ('H_XYG', 66, ' ')
Duplicate: 3bln  0 A ('H_MPD', 147, ' ')
Duplicate: 3ned  0 A ('H_CH6', 67, ' ')
Duplicate: 3ned  0 A ('H_NRQ', 67, ' ')
Duplicate: 3l4j  0 A ('H_PTR', 782, ' ')
Duplicate: 1ysl  0 B (' ', 111, ' ')
Duplicate: 3gju  0 A (' ', 289, ' ')
Duplicate: 3fcr  0 A ('H_LLP', 288, ' ')
Duplicate: 1xpm  0 A (' ', 111, ' ')
Duplicate: 1xpm  0 B (' ', 111, ' ')
Duplicate: 1xpm  0 C (' ', 111, ' ')
Duplicate: 1xpm  0 D (' ', 111, ' ')
Duplicate: 2vqr  0 A ('H_DDZ', 57, ' ')
Duplicate: 3piu  0 A (' ', 273, ' ')
Duplicate: 2w8s  0 A ('H_FGL', 57, ' ')
Duplicate: 2w8s  0 B ('H_FGL', 57, ' ')
Duplicate: 2w8s  0 C ('H_FGL', 57, ' ')
Duplicate: 2w8s  0 D ('H_FGL', 57, ' ')
Duplicate: 2wpn  0 B ('H_PSW', 489, ' ')
Duplicate: 2wpn  0 B ('H_PSW', 489, ' ')
Duplicate: 3a0m  0 F (' ', 13, ' ')
Duplicate: 3a0m  0 F (' ', 16, ' ')
Duplicate: 3a0m  0 F (' ', 13, ' ')
Duplicate: 3a0m  0 F (' ', 16, ' ')
Duplicate: 2ci1  0 A ('H_K1R', 273, ' ')
Duplicate: 2uv2  0 A ('H_TPO', 183, ' ')
Duplicate: 3d3w  0 B ('H_CSO', 138, ' ')
Duplicate: 3hvy  0 A ('H_LLP', 243, ' ')
Duplicate: 3hvy  0 B ('H_LLP', 243, ' ')
Duplicate: 3hvy  0 C ('H_LLP', 243, ' ')
Duplicate: 3hvy  0 D ('H_LLP', 243, ' ')
Duplicate: 2j6v  0 A ('H_ALY', 229, ' ')
Duplicate: 2j6v  0 B ('H_ALY', 229, ' ')

-- 
Hongbo




More information about the Biopython-dev mailing list