I still feel this was a minor change (although of
course important to some, including you). This is
parsing of malformed PDF files where the user
ALREADY gets a warning (or error in strict mode,
where there would be no functional change) that
there is a problem with the occupancy data.
One reason why I specifically talked about small
commits (in the sense of a simple diff) above is
they are trivial to revert if the need arises, or as
in this case, modify:
https://github.com/biopython/biopython/commit/500c3c2ea900fd8c8f5123f571d4d9a244ee898e
This change was suggested and supported by
people who've been actively contributing to the
Biopython structural module for some time, so I
had reason to trust their good judgement, and as
I wrote at the time there was a clear consensus
with three people in all happy with the idea:
http://lists.open-bio.org/pipermail/biopython-dev/2013-August/010773.html
I respect that you listen more to developers that have been contributing
for a long time. That is quite understandable, but I hope that does not
prevent me from contributing my opinions.
What prompted my response was the suggestion that the occupancy should be
set to 1.0 if it is abscent from the file, i.e. if the PDB file is
malformed. I think that is an incorrect behavior, and I say that not as a
core developer, but as a crystallographer. If invalid data is present in
the file, you do not want the toolkit transforming it to valid data.

 I appreciate the physical/practical feedback about the commits.

After thinking about it, the suggestion to set values to None when they are
not defined in a malformed file now appears quite reasonable, but if it is
done this way with occupancies, it should also done this way with
B-factors, chain identifiers and other values that are mandatory in the
file according to the format specs. From the users perspective, if the
values returned are None, you are alerted to the fact that something is
wrong, and you should make an appropriate choice, whatever that may be.
I agree that `None` is a good warning value for missing data.

I just skimmed the code and summarized how some of the missing values are

* Serial number: 0
* Chain: fatal in both strict and permissive modes (i.e. no try/except)
* Coordinates: fatal in both strict and permissive modes
* Occupancy: we recently decided to set as None in permissive
* B-factor: 0.0 in permissive (code comment states this is PDB default)
* Model seq id: 0

The StructureBuilder class also has certain ways of handling duplicate
residues and atoms that I'm not particularly familiar with. For example,
I'm not quite sure what will happen if successive atoms have missing serial

PDB is a format where there's always a balance between absolute adherence
to the format and enough flexibility to deal with the wide range of
malformed files.


