[Biopython-dev] Bio.PDB - Missing values (was Moratorium on commits?)

Peter Cock p.j.a.cock at googlemail.com
Fri Aug 23 09:05:02 UTC 2013

On Tue, Aug 20, 2013 at 11:16 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> On Thu, Aug 15, 2013 at 9:23 AM, Peter Cock <p.j.a.cock at googlemail.com>
> wrote:
>> I didn't mean to suggest writing the string "None" in the field, and
>> I'm not sure if João did - it would certainly be an invalid PDB file.
>> I agree that where the data structure has None (e.g. from our parser)
>> then the writer could use a blank string (of the appropriate width).
>> For mandatory fields like occupancy, this should give a warning.
> As I suspected, the writer currently fails on None (it's expecting a float).
> Test-driven development!
> However, I don't see a simple or elegant way to force writing of a blank
> occupancy. ATOM lines are currently written using C-style string formatting,
> and the occupancy field is `%6.2f`.
> Off the top of my head, I'd:
> 1. Store the original format string
> 2. Modify the format string to have "%6s" at the appropriate position
> 3. Modify the occupancy to be an empty string or a space
> 4. Set the return value to the formatted string
> 5. Restore the original format string
> 6. Return the return value
> However, this seems...ugly at best. I don't know that switching formatting
> styles (e.g. to string.format() or others) will help. And in most
> circumstances, the type checking of the format string is useful.
> Any thoughts?

I would suggest something like this (untested):

$ git diff
diff --git a/Bio/PDB/PDBIO.py b/Bio/PDB/PDBIO.py
index 2f64571..11a52ca 100644
--- a/Bio/PDB/PDBIO.py
+++ b/Bio/PDB/PDBIO.py
@@ -8,7 +8,7 @@
 from Bio.PDB.StructureBuilder import StructureBuilder # To allow
saving of chains, residues, etc..
 from Bio.Data.IUPACData import atom_weights # Allowed Elements

-_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c
%8.3f%8.3f%8.3f%6.2f%6.2f      %4s%2s%2s\n"
+_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c   %8.3f%8.3f%8.3f%s%6.2f

 class Select(object):
@@ -85,8 +85,21 @@ class PDBIO(object):
         x, y, z=atom.get_coord()
+        # Handle a missing occupancy (None) with a blank entry:
+        try:
+            occupancy_str = "%6.2f" % occupancy
+        except TypeError:
+            if occupancy is None:
+                occupancy_str = " " * 6
+                import warnings
+                from Bio import BiopythonWarning
+                # TODO - Introduce exception BiopythonWriterWarning?
+                warning.warn("Missing occupancy will be recorded as blank",
+                             BiopythonWarning)
+            else:
+                raise TypeError("Invalid occupancy %r in atom %r" %
(occupancy, atom))
         args=(record_type, atom_number, name, altloc, resname, chain_id,
-            resseq, icode, x, y, z, occupancy, bfactor, segid,
+            resseq, icode, x, y, z, occupancy_str, bfactor, segid,
             element, charge)
         return _ATOM_FORMAT_STRING % args

The error message could be improved (e.g. a more helpful identification
of the ATOM at fault)?


More information about the Biopython-dev mailing list