[Biopython-dev] [Bug 3096] PPBuilder build_peptides bugs

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri Aug 13 22:23:24 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3096


skong at zymeworks.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Version|Not Applicable              |1.53




------- Comment #3 from skong at zymeworks.com  2010-08-13 18:23 EST -------
Hi Peter,

I manage to produce the problem without modifying _accept().

DIAGNOSTIC SCRIPT:
from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa 

def extract_peptides(model):
    """Extracts the peptides from a model.
    Returns a list of Peptide object."""
    output = []
    for peptide in PPBuilder().build_peptides(model): 
        seq = str(peptide.get_sequence())
        output.append(seq)
    return output

if __name__ == '__main__':

    pdb = open('chopped_pdb1bfe_noca.ent')
    st = PDBParser().get_structure('', pdb)
    seqa = extract_peptides(st)
    print 'no ca seq all'
    print seqa


PDB FILE: chopped_pdb1bfe_noca.ent
ATOM     85  N   ILE A 316      37.386  71.217  31.070  1.00 36.97           N  
ATOM     86  CA  ILE A 316      38.311  71.290  29.949  1.00 33.71           C  
ATOM     87  C   ILE A 316      37.634  72.103  28.862  1.00 33.93           C  
ATOM     88  O   ILE A 316      36.415  72.216  28.839  1.00 36.46           O  
ATOM     89  CB  ILE A 316      38.651  69.876  29.404  1.00 35.79           C  
ATOM     90  CG1 ILE A 316      39.331  69.049  30.501  1.00 36.78           C  
ATOM     91  CG2 ILE A 316      39.572  69.979  28.187  1.00 37.71           C  
ATOM     92  CD1 ILE A 316      39.881  67.724  30.023  1.00 39.20           C  
ATOM     93  N   HIS A 317      38.425  72.679  27.969  1.00 35.61           N  
ATOM     94  CA  HIS A 317      37.880  73.473  26.881  1.00 37.92           C  
ATOM     95  C   HIS A 317      38.360  72.928  25.540  1.00 37.79           C  
ATOM     96  O   HIS A 317      39.463  73.240  25.094  1.00 37.44           O  
ATOM     97  CB  HIS A 317      38.303  74.930  27.052  1.00 35.19           C  
ATOM     98  CG  HIS A 317      37.888  75.519  28.363  1.00 35.76           C  
ATOM     99  ND1 HIS A 317      36.611  75.981  28.602  1.00 37.74           N  
ATOM    100  CD2 HIS A 317      38.575  75.701  29.516  1.00 37.59           C  
ATOM    101  CE1 HIS A 317      36.529  76.420  29.844  1.00 38.74           C  
ATOM    102  NE2 HIS A 317      37.706  76.262  30.421  1.00 36.76           N  
ATOM    103  N   ARG A 318      37.527  72.109  24.905  1.00 38.78           N  
ATOM    104  CA  ARG A 318      37.884  71.512  23.627  1.00 42.04           C  
ATOM    105  C   ARG A 318      38.469  72.559  22.699  1.00 45.14           C  
ATOM    106  O   ARG A 318      39.592  72.425  22.205  1.00 42.05           O  
ATOM    107  CB  ARG A 318      36.657  70.880  22.967  1.00 42.93           C  
ATOM    108  CG  ARG A 318      36.934  70.321  21.576  1.00 38.60           C  
ATOM    109  CD  ARG A 318      35.654  70.038  20.821  1.00 35.39           C  
ATOM    110  NE  ARG A 318      34.624  69.538  21.724  1.00 34.96           N  
ATOM    111  CZ  ARG A 318      34.539  68.278  22.141  1.00 31.51           C  
ATOM    112  NH1 ARG A 318      35.419  67.373  21.736  1.00 25.19           N  
ATOM    113  NH2 ARG A 318      33.579  67.929  22.983  1.00 29.10           N  
ATOM    114  N   XLY A 319      37.690  73.604  22.461  1.00 49.96           N  
ATOM    115  CX  XLY A 319      38.138  74.668  21.592  1.00 55.53           C  
ATOM    116  C   XLY A 319      38.459  74.219  20.180  1.00 58.85           C  
ATOM    117  O   XLY A 319      37.583  73.766  19.440  1.00 58.98           O  
ATOM    118  N   SER A 320      39.734  74.334  19.823  1.00 61.64           N  
ATOM    119  CA  SER A 320      40.219  73.992  18.493  1.00 63.16           C  
ATOM    120  C   SER A 320      40.212  72.517  18.110  1.00 65.27           C  
ATOM    121  O   SER A 320      39.558  72.127  17.145  1.00 65.12           O  
ATOM    122  CB  SER A 320      41.634  74.542  18.316  1.00 65.36           C  
ATOM    123  OG  SER A 320      42.124  74.255  17.019  1.00 72.05           O  
ATOM    124  N   THR A 321      40.955  71.702  18.853  1.00 67.43           N  
ATOM    125  CA  THR A 321      41.049  70.274  18.562  1.00 67.73           C  
ATOM    126  C   THR A 321      40.220  69.430  19.529  1.00 66.41           C  
ATOM    127  O   THR A 321      39.244  69.917  20.095  1.00 70.21           O  
ATOM    128  CB  THR A 321      42.517  69.810  18.620  1.00 70.22           C  
ATOM    129  OG1 THR A 321      42.613  68.453  18.169  1.00 77.03           O  
ATOM    130  CG2 THR A 321      43.049  69.915  20.045  1.00 72.07           C  
ATOM    131  N   GLY A 322      40.608  68.168  19.707  1.00 61.22           N  
ATOM    132  CA  GLY A 322      39.892  67.286  20.614  1.00 53.23           C  
ATOM    133  C   GLY A 322      40.037  67.705  22.065  1.00 48.00           C  
ATOM    134  O   GLY A 322      40.138  68.892  22.372  1.00 50.41           O  
ATOM    135  N   LEU A 323      40.044  66.734  22.968  1.00 41.92           N  
ATOM    136  CA  LEU A 323      40.190  67.033  24.385  1.00 35.58           C  
ATOM    137  C   LEU A 323      41.613  66.738  24.874  1.00 31.41           C  
ATOM    138  O   LEU A 323      41.932  66.921  26.046  1.00 30.47           O  
ATOM    139  CB  LEU A 323      39.160  66.240  25.191  1.00 35.76           C  
ATOM    140  CG  LEU A 323      37.716  66.576  24.802  1.00 39.50           C  
ATOM    141  CD1 LEU A 323      36.733  65.796  25.670  1.00 38.15           C  
ATOM    142  CD2 LEU A 323      37.493  68.074  24.955  1.00 38.58           C



The output peptides should be: ['IHR',STGL'] not ['IHRXTGL'] in the current
version. Residue XLY A 319 or X in the fourth position should not be included
since it doesn't have CA atom. Instead the current version includes it and
remove the 'S' next to it, due to the same bug. One can get the right version
using the patch provided before.

Whether the _accept is modified or not the bug remains. Also the user should
not be expected to also modify build_peptides() method whenever PPBuilder
_accept is modified since the accept variable in build_peptides isn't really a
local (private) variable: In line 277 this variable accept is referenced from
self.accept of PPBuilder.

http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
277          accept=self._accept 


On a side note the "aa_only" optional input variable for build_peptides() and
its comments are very misleading (@param aa_only: if 1, the residue needs to be
a standard AA). "aa_only" is meant as a flag that tells peptide_builder to
start filtering amino acids that are not to be accepted, and by default it is
turned on and without modifying _accept of PeptideBuilder only residues with
"CA" atom are accepted (line 250-264), not standard amino acids as the comment
states. In other words without modifying _accept in PeptideBuilder non standard
amino acid will still be accepted and included in the peptides built. Only when
overriding the _accept method of PeptideBuilder (as I did before) would
build_peptides() not include non-standard amino acids. I suggest renaming
"aa_only" to something more sensible like "filter_aa".

http://www.biopython.org/DIST/docs/api/Bio.PDB.Polypeptide-pysrc.html
266 -    def build_peptides(self, entity, aa_only=1): 
273          @param aa_only: if 1, the residue needs to be a standard AA 
274          @type aa_only: int 


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list