[Biopython-dev] [Bug 3096] New: PPBuilder build_peptides bugs

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Jun 8 22:52:28 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3096

           Summary: PPBuilder build_peptides bugs
           Product: Biopython
           Version: Not Applicable
          Platform: Other
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: skong at zymeworks.com


Given a chain of backbone connected residues 'IXRGXTGL' that contains two
non-standard amino acids 'X' in between, building peptide with only standard
amino acid builder should return two peptides 'RG' and 'TGL'. 'I' should not be
returned as a peptide since it is just one residue. Currently biopython would
return 'IXGXGL', with two bugs in between:

1. Skipping a standard amino acid R and T after each X, while keeping X (Should
skip X instead not R or T). Related to
http://bugzilla.open-bio.org/show_bug.cgi?id=2910 and
http://lists.open-bio.org/pipermail/biopython/2009-September/005532.html
2. Return one peptide even though after filtering the two X residues which
connect 'I', 'RG', 'TGL' are no longer present and fragment 'IRGTGL' cannot be
considered as a valid peptide without the two Xs connecting them.

The above sequence 'IXRGXTGL' are taken from 1bfe and mutated. The 'mutation'
referred here is simply renaming the residue name to something that is not
standard and represented as 'X'. 

Each solution proposed below is meant to fix respective bug above: 
1. Insert (not accept(prev) or not accept(next)) after if aa_only check at line
299 of Bio/PDB/Polypeptide.py
2. Insert pp=None when either of the residues compared are filtered at line 300
or Bio/PDB/Polypeptide.py


Amino acids filtering bug in method build_peptides() of class _PPBuilder ofin
Bio/PDB/Polypeptide.py:

Original:
        for chain in chain_list:
            chain_it=iter(chain)
            prev=chain_it.next()
            pp=None
            for next in chain_it:
                if aa_only and not accept(prev):
                    prev=next
                    continue
                if is_connected(prev, next):
                    if pp is None:
                        pp=Polypeptide()
                        pp.append(prev)
                        pp_list.append(pp)
                    pp.append(next)
                else:
                    pp=None
                prev=next
        return pp_list


Fixed:

        for chain in chain_list:
            chain_it=iter(chain)
            prev=chain_it.next()
            pp=None
            for next in chain_it:
                if aa_only and (not accept(prev) or not accept(next)):
                    prev=next; pp=None
                    continue
                if is_connected(prev, next):
                    if pp is None:
                        pp=Polypeptide()
                        pp.append(prev)
                        pp_list.append(pp)
                    pp.append(next)
                else:
                    pp=None
                prev=next
        return pp_list

Attached here is the code used to test the above case, with and without
mutations, and with and without standard amino acid filtering. The case without
mutation is just to show that the backbone atoms of the mutated version are
connected:

from Bio.PDB.PDBParser import PDBParser
from Bio.PDB.Polypeptide import PPBuilder, is_aa 

class StandardAABuilder(PPBuilder): 
    """ Polypeptide builder which accepts only standard amino acids.""" 
    def _accept(self, residue): 
        return is_aa(residue, standard=True) 

def extract_peptides(model):
    """Extracts the peptides from a model.
    Returns a list of Peptide object."""
    output = []
    for peptide in PPBuilder().build_peptides(model): 
        seq = str(peptide.get_sequence())
        output.append(seq)
    return output

def extract_peptides_saa(model):
    """Extracts the peptides from a model.
    Returns a list of Peptide object."""
    output = []
    for peptide in StandardAABuilder().build_peptides(model): 
        seq = str(peptide.get_sequence())
        output.append(seq)
    return output

if __name__ == '__main__':

    oripdb = open('chopped_pdb1bfe.ent')
    sto = PDBParser().get_structure('', oripdb)
    seqao = extract_peptides(sto)
    seqbo = extract_peptides_saa(sto)
    print 'ori seq all '
    print seqao  
    print 'ori seq standard only'
    print seqbo

    pdb = open('chopped_mutated_pdb1bfe.ent')
    st = PDBParser().get_structure('', pdb)
    seqa = extract_peptides(st)
    seqb = extract_peptides_saa(st)
    print 'mut seq all'
    print seqa
    print 'mut seq standard only '
    print seqb


Attached below are the two fragments of PDB files, pre and post mutated.

chopped_pdb1bfe.ent
ATOM     85  N   ILE A 316      37.386  71.217  31.070  1.00 36.97           N  
ATOM     86  CA  ILE A 316      38.311  71.290  29.949  1.00 33.71           C  
ATOM     87  C   ILE A 316      37.634  72.103  28.862  1.00 33.93           C  
ATOM     88  O   ILE A 316      36.415  72.216  28.839  1.00 36.46           O  
ATOM     89  CB  ILE A 316      38.651  69.876  29.404  1.00 35.79           C  
ATOM     90  CG1 ILE A 316      39.331  69.049  30.501  1.00 36.78           C  
ATOM     91  CG2 ILE A 316      39.572  69.979  28.187  1.00 37.71           C  
ATOM     92  CD1 ILE A 316      39.881  67.724  30.023  1.00 39.20           C  
ATOM     93  N   HIS A 317      38.425  72.679  27.969  1.00 35.61           N  
ATOM     94  CA  HIS A 317      37.880  73.473  26.881  1.00 37.92           C  
ATOM     95  C   HIS A 317      38.360  72.928  25.540  1.00 37.79           C  
ATOM     96  O   HIS A 317      39.463  73.240  25.094  1.00 37.44           O  
ATOM     97  CB  HIS A 317      38.303  74.930  27.052  1.00 35.19           C  
ATOM     98  CG  HIS A 317      37.888  75.519  28.363  1.00 35.76           C  
ATOM     99  ND1 HIS A 317      36.611  75.981  28.602  1.00 37.74           N  
ATOM    100  CD2 HIS A 317      38.575  75.701  29.516  1.00 37.59           C  
ATOM    101  CE1 HIS A 317      36.529  76.420  29.844  1.00 38.74           C  
ATOM    102  NE2 HIS A 317      37.706  76.262  30.421  1.00 36.76           N  
ATOM    103  N   ARG A 318      37.527  72.109  24.905  1.00 38.78           N  
ATOM    104  CA  ARG A 318      37.884  71.512  23.627  1.00 42.04           C  
ATOM    105  C   ARG A 318      38.469  72.559  22.699  1.00 45.14           C  
ATOM    106  O   ARG A 318      39.592  72.425  22.205  1.00 42.05           O  
ATOM    107  CB  ARG A 318      36.657  70.880  22.967  1.00 42.93           C  
ATOM    108  CG  ARG A 318      36.934  70.321  21.576  1.00 38.60           C  
ATOM    109  CD  ARG A 318      35.654  70.038  20.821  1.00 35.39           C  
ATOM    110  NE  ARG A 318      34.624  69.538  21.724  1.00 34.96           N  
ATOM    111  CZ  ARG A 318      34.539  68.278  22.141  1.00 31.51           C  
ATOM    112  NH1 ARG A 318      35.419  67.373  21.736  1.00 25.19           N  
ATOM    113  NH2 ARG A 318      33.579  67.929  22.983  1.00 29.10           N  
ATOM    114  N   GLY A 319      37.690  73.604  22.461  1.00 49.96           N  
ATOM    115  CA  GLY A 319      38.138  74.668  21.592  1.00 55.53           C  
ATOM    116  C   GLY A 319      38.459  74.219  20.180  1.00 58.85           C  
ATOM    117  O   GLY A 319      37.583  73.766  19.440  1.00 58.98           O  
ATOM    118  N   SER A 320      39.734  74.334  19.823  1.00 61.64           N  
ATOM    119  CA  SER A 320      40.219  73.992  18.493  1.00 63.16           C  
ATOM    120  C   SER A 320      40.212  72.517  18.110  1.00 65.27           C  
ATOM    121  O   SER A 320      39.558  72.127  17.145  1.00 65.12           O  
ATOM    122  CB  SER A 320      41.634  74.542  18.316  1.00 65.36           C  
ATOM    123  OG  SER A 320      42.124  74.255  17.019  1.00 72.05           O  
ATOM    124  N   THR A 321      40.955  71.702  18.853  1.00 67.43           N  
ATOM    125  CA  THR A 321      41.049  70.274  18.562  1.00 67.73           C  
ATOM    126  C   THR A 321      40.220  69.430  19.529  1.00 66.41           C  
ATOM    127  O   THR A 321      39.244  69.917  20.095  1.00 70.21           O  
ATOM    128  CB  THR A 321      42.517  69.810  18.620  1.00 70.22           C  
ATOM    129  OG1 THR A 321      42.613  68.453  18.169  1.00 77.03           O  
ATOM    130  CG2 THR A 321      43.049  69.915  20.045  1.00 72.07           C  
ATOM    131  N   GLY A 322      40.608  68.168  19.707  1.00 61.22           N  
ATOM    132  CA  GLY A 322      39.892  67.286  20.614  1.00 53.23           C  
ATOM    133  C   GLY A 322      40.037  67.705  22.065  1.00 48.00           C  
ATOM    134  O   GLY A 322      40.138  68.892  22.372  1.00 50.41           O  
ATOM    135  N   LEU A 323      40.044  66.734  22.968  1.00 41.92           N  
ATOM    136  CA  LEU A 323      40.190  67.033  24.385  1.00 35.58           C  
ATOM    137  C   LEU A 323      41.613  66.738  24.874  1.00 31.41           C  
ATOM    138  O   LEU A 323      41.932  66.921  26.046  1.00 30.47           O  
ATOM    139  CB  LEU A 323      39.160  66.240  25.191  1.00 35.76           C  
ATOM    140  CG  LEU A 323      37.716  66.576  24.802  1.00 39.50           C  
ATOM    141  CD1 LEU A 323      36.733  65.796  25.670  1.00 38.15           C  
ATOM    142  CD2 LEU A 323      37.493  68.074  24.955  1.00 38.58           C

PDB FILE: mutated_chopped_pdb1bfe.ent
ATOM     85  N   ILE A 316      37.386  71.217  31.070  1.00 36.97           N  
ATOM     86  CA  ILE A 316      38.311  71.290  29.949  1.00 33.71           C  
ATOM     87  C   ILE A 316      37.634  72.103  28.862  1.00 33.93           C  
ATOM     88  O   ILE A 316      36.415  72.216  28.839  1.00 36.46           O  
ATOM     89  CB  ILE A 316      38.651  69.876  29.404  1.00 35.79           C  
ATOM     90  CG1 ILE A 316      39.331  69.049  30.501  1.00 36.78           C  
ATOM     91  CG2 ILE A 316      39.572  69.979  28.187  1.00 37.71           C  
ATOM     92  CD1 ILE A 316      39.881  67.724  30.023  1.00 39.20           C  
ATOM     93  N   HIE A 317      38.425  72.679  27.969  1.00 35.61           N  
ATOM     94  CA  HIE A 317      37.880  73.473  26.881  1.00 37.92           C  
ATOM     95  C   HIE A 317      38.360  72.928  25.540  1.00 37.79           C  
ATOM     96  O   HIE A 317      39.463  73.240  25.094  1.00 37.44           O  
ATOM     97  CB  HIE A 317      38.303  74.930  27.052  1.00 35.19           C  
ATOM     98  CG  HIE A 317      37.888  75.519  28.363  1.00 35.76           C  
ATOM     99  ND1 HIE A 317      36.611  75.981  28.602  1.00 37.74           N  
ATOM    100  CD2 HIE A 317      38.575  75.701  29.516  1.00 37.59           C  
ATOM    101  CE1 HIE A 317      36.529  76.420  29.844  1.00 38.74           C  
ATOM    102  NE2 HIE A 317      37.706  76.262  30.421  1.00 36.76           N
ATOM    103  N   ARG A 318      37.527  72.109  24.905  1.00 38.78           N  
ATOM    104  CA  ARG A 318      37.884  71.512  23.627  1.00 42.04           C  
ATOM    105  C   ARG A 318      38.469  72.559  22.699  1.00 45.14           C  
ATOM    106  O   ARG A 318      39.592  72.425  22.205  1.00 42.05           O  
ATOM    107  CB  ARG A 318      36.657  70.880  22.967  1.00 42.93           C  
ATOM    108  CG  ARG A 318      36.934  70.321  21.576  1.00 38.60           C  
ATOM    109  CD  ARG A 318      35.654  70.038  20.821  1.00 35.39           C  
ATOM    110  NE  ARG A 318      34.624  69.538  21.724  1.00 34.96           N  
ATOM    111  CZ  ARG A 318      34.539  68.278  22.141  1.00 31.51           C  
ATOM    112  NH1 ARG A 318      35.419  67.373  21.736  1.00 25.19           N  
ATOM    113  NH2 ARG A 318      33.579  67.929  22.983  1.00 29.10           N  
ATOM    114  N   GLY A 319      37.690  73.604  22.461  1.00 49.96           N  
ATOM    115  CA  GLY A 319      38.138  74.668  21.592  1.00 55.53           C  
ATOM    116  C   GLY A 319      38.459  74.219  20.180  1.00 58.85           C  
ATOM    117  O   GLY A 319      37.583  73.766  19.440  1.00 58.98           O  
ATOM    118  N   XQQ A 320      39.734  74.334  19.823  1.00 61.64           N  
ATOM    119  CA  XQQ A 320      40.219  73.992  18.493  1.00 63.16           C  
ATOM    120  C   XQQ A 320      40.212  72.517  18.110  1.00 65.27           C  
ATOM    121  O   XQQ A 320      39.558  72.127  17.145  1.00 65.12           O  
ATOM    122  CB  XQQ A 320      41.634  74.542  18.316  1.00 65.36           C  
ATOM    123  OG  XQQ A 320      42.124  74.255  17.019  1.00 72.05           O
ATOM    124  N   THR A 321      40.955  71.702  18.853  1.00 67.43           N  
ATOM    125  CA  THR A 321      41.049  70.274  18.562  1.00 67.73           C  
ATOM    126  C   THR A 321      40.220  69.430  19.529  1.00 66.41           C  
ATOM    127  O   THR A 321      39.244  69.917  20.095  1.00 70.21           O  
ATOM    128  CB  THR A 321      42.517  69.810  18.620  1.00 70.22           C  
ATOM    129  OG1 THR A 321      42.613  68.453  18.169  1.00 77.03           O  
ATOM    130  CG2 THR A 321      43.049  69.915  20.045  1.00 72.07           C  
ATOM    131  N   GLY A 322      40.608  68.168  19.707  1.00 61.22           N  
ATOM    132  CA  GLY A 322      39.892  67.286  20.614  1.00 53.23           C  
ATOM    133  C   GLY A 322      40.037  67.705  22.065  1.00 48.00           C  
ATOM    134  O   GLY A 322      40.138  68.892  22.372  1.00 50.41           O  
ATOM    135  N   LEU A 323      40.044  66.734  22.968  1.00 41.92           N  
ATOM    136  CA  LEU A 323      40.190  67.033  24.385  1.00 35.58           C  
ATOM    137  C   LEU A 323      41.613  66.738  24.874  1.00 31.41           C  
ATOM    138  O   LEU A 323      41.932  66.921  26.046  1.00 30.47           O  
ATOM    139  CB  LEU A 323      39.160  66.240  25.191  1.00 35.76           C  
ATOM    140  CG  LEU A 323      37.716  66.576  24.802  1.00 39.50           C  
ATOM    141  CD1 LEU A 323      36.733  65.796  25.670  1.00 38.15           C  
ATOM    142  CD2 LEU A 323      37.493  68.074  24.955  1.00 38.58           C


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list