[Biopython-dev] [Bug 2910] Bio.PDB build_peptides sometimes gives shorter peptide sequences than expected

Wed Sep 23 15:40:00 UTC 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2910

------- Comment #3 from biopython-bugzilla at maubp.freeserve.co.uk  2009-09-23 11:39 EST -------
I think the problem with PDB file 1A2D is due to the atypical PYX residue,

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import is_aa
structure = PDBParser().get_structure('tmp', '1A2D.pdb')
for model in structure :
    for chain in model :
        for res in chain :
            if "CA" in res.child_dict and not is_aa(res) :
                print chain, res

The polypeptide code only looks at residues that pass the is_aa test, which
means we can ignore things like water atoms associated with a chain. In this
PDB file there are two residues which fail this test:

<Chain id=A> <Residue PYX het=H_PYX resseq=117 icode= >
<Chain id=B> <Residue PYX het=H_PYX resseq=117 icode= >

According to the SEQADV and MODRES lines, these are modified CYS residues.
Comparing this to the PDB provided FASTA file, a "C" is used (CYS). This
leads me to believe the fix is to add the PYX -> C mapping to Biopython.
[The dictionary used, to_one_letter_code, is actually defined in file
Bio/SCOP/RAF.py for some historical reason.]

Consulting the PDB documentation suggests that there are potentially
many more examples like this of unknown HETATM entries which are
modified amino acid residues... see:
ftp://ftp.wwpdb.org/pub/pdb/data/monomers/

Christian - did you find any other problem PDB files?

Peter

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.