[Biopython-dev] [GSoC] GSoC python variant update

Eric Talevich eric.talevich at gmail.com
Thu May 10 13:36:49 UTC 2012

On Wed, May 9, 2012 at 8:16 PM, Lenna Peterson <arklenna at gmail.com> wrote:

> I looked at the structure of PDBParser. Is the idea that a user might pass
> in an instance of StructureBuilder that already contained some structure
> and add to it? Or is there another purpose that isn't jumping out at me? In
> my skeleton code, I used the example of StructureBuilder, but I'm not sure
> if there's an advantage to passing the object rather than the object's name.
My understanding of the producer/consumer design in Bio.PDB (I didn't write
it) is that the logic for parsing the given file format is contained in the
*Parser class, and the logic for building the target object is in the
*Builder class. This is useful if the target object is somewhat complex to
build, as is the case with PDB's Structure/Model/Chain/Residue/Atom
hierarchy -- the parser just passes raw values along to the appropriate
method on the StructureBuilder class.

(The Internet also points out that this design is super useful if
"producing" and "consuming" are asynchronous, which is not the case here...

Regarding the shared interface, I think we've generally achieved this
throughout most of Biopython by just remembering to implement the required
methods on each parser and writer class -- just "parse" and "write",
usually. Essentially, it's your design minus the common base class that
enforces the interface; an error in the implementation would result in an
AttributeError rather than a NotImplementedError. This works because (1)
Python uses duck typing, unlike C++ and Java; (2) in Biopython, each file
format is usually implemented by one dedicated person who can keep it all
in their head, and we don't add new file formats very rapidly; (3) we
maintain pretty good coverage with our unit tests, and certainly add unit
tests for new parsers.

Given all that, I think your design is superior, and it's quite clear how
it all works from the way you've written it.

As for the difference between passing an instance of the *Builder object
versus a reference to the *Builder class (did I get that right?), it
requires slightly less code from the user to pass a reference to the class.
Also, if you set the object-or-class as a default argument, remember that
objects are mutable, so you risk hitting one of Python's most infamous
gotchas (default arguments are only evaluated once, so the second time you
use the parser, you'll be adding to the original object instead of starting
with a fresh copy).


More information about the Biopython-dev mailing list