[Biopython-dev] Bio.PDB - Missing values (was Moratorium on commits?)
p.j.a.cock at googlemail.com
Fri Aug 23 09:05:02 UTC 2013
On Tue, Aug 20, 2013 at 11:16 PM, Lenna Peterson <arklenna at gmail.com> wrote:
> On Thu, Aug 15, 2013 at 9:23 AM, Peter Cock <p.j.a.cock at googlemail.com>
>> I didn't mean to suggest writing the string "None" in the field, and
>> I'm not sure if João did - it would certainly be an invalid PDB file.
>> I agree that where the data structure has None (e.g. from our parser)
>> then the writer could use a blank string (of the appropriate width).
>> For mandatory fields like occupancy, this should give a warning.
> As I suspected, the writer currently fails on None (it's expecting a float).
> Test-driven development!
> However, I don't see a simple or elegant way to force writing of a blank
> occupancy. ATOM lines are currently written using C-style string formatting,
> and the occupancy field is `%6.2f`.
> Off the top of my head, I'd:
> 1. Store the original format string
> 2. Modify the format string to have "%6s" at the appropriate position
> 3. Modify the occupancy to be an empty string or a space
> 4. Set the return value to the formatted string
> 5. Restore the original format string
> 6. Return the return value
> However, this seems...ugly at best. I don't know that switching formatting
> styles (e.g. to string.format() or others) will help. And in most
> circumstances, the type checking of the format string is useful.
> Any thoughts?
I would suggest something like this (untested):
$ git diff
diff --git a/Bio/PDB/PDBIO.py b/Bio/PDB/PDBIO.py
index 2f64571..11a52ca 100644
@@ -8,7 +8,7 @@
from Bio.PDB.StructureBuilder import StructureBuilder # To allow
saving of chains, residues, etc..
from Bio.Data.IUPACData import atom_weights # Allowed Elements
-_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c
+_ATOM_FORMAT_STRING="%s%5i %-4s%c%3s %c%4i%c %8.3f%8.3f%8.3f%s%6.2f
@@ -85,8 +85,21 @@ class PDBIO(object):
x, y, z=atom.get_coord()
+ # Handle a missing occupancy (None) with a blank entry:
+ occupancy_str = "%6.2f" % occupancy
+ except TypeError:
+ if occupancy is None:
+ occupancy_str = " " * 6
+ import warnings
+ from Bio import BiopythonWarning
+ # TODO - Introduce exception BiopythonWriterWarning?
+ warning.warn("Missing occupancy will be recorded as blank",
+ raise TypeError("Invalid occupancy %r in atom %r" %
args=(record_type, atom_number, name, altloc, resname, chain_id,
- resseq, icode, x, y, z, occupancy, bfactor, segid,
+ resseq, icode, x, y, z, occupancy_str, bfactor, segid,
return _ATOM_FORMAT_STRING % args
The error message could be improved (e.g. a more helpful identification
of the ATOM at fault)?
More information about the Biopython-dev