[Biopython-dev] pull request: Handle MMCIF with multiple models (closes 2943)
eric.talevich at gmail.com
Tue Apr 24 15:38:50 UTC 2012
On Tue, Apr 24, 2012 at 12:25 AM, Lenna Peterson <arklenna at gmail.com> wrote:
> On Mon, Apr 23, 2012 at 4:10 PM, Eric Talevich <eric.talevich at gmail.com> wrote:
>> Ack, I didn't look at that closely enough. Check out this patch to see
>> the current situation:
>> The models associated with a structure are numbered with a sequential
>> integer id, starting from 0. It's always been like that in our PDB
>> parser and we haven't changed it. To ensure that model numbers
>> specified in the PDB file are preserved when writing the PDB back to
>> file, the above patch introduced a new attribute on the Model object
>> called serial_num (also an integer, equal to model.id unless specified
>> otherwise). That attribute is only used when writing a new PDB file;
>> Model.__getitem__ still uses Model.id as before.
>> Perhaps that's surprising now that we read the serial numbers, but it
>> kept backward compatibility. Plus, it preserves list-like behavior
>> (item access via integers), even though the models are actually stored
>> in a dict.
>> In the mmCIF parser, the calls to structure_builder.init_model should
>> be given two arguments instead of one: an integer id counting from 0,
>> and then another integer (probably) containing the model "serial
>> number" specified in the mmCIF file. In the event that an mmCIF file
>> doesn't specify the model number, the serial number should be the same
>> as the sequential id.
>> Cool? This will also help us convert between PDB and mmCIF formats in
>> the future.
>> As for accessing the models by their serial number, using string keys
>> seems like an effective workaround, but still obviously a workaround
>> rather than an ideal situation. Let's discuss that a little more,
>> perhaps file another bug when we've reached some consensus.
> Hi Eric,
> I believe I've implemented the model_id/serial_id system found in PDB:
> Please let me know if you think that looks right. I couldn't find an
> mmCIF file without a model column to test, but I believe in that case
> it will assign model_id and serial_id to 0. Would that be the correct
> I also modified the unit test to check the model serial_num.
> Currently serial_num is int() of the CIF model column. Regarding
> access by string serial_num, I am concerned that the int/string access
> would be too subtle (structure == structure['1']; structure ==
> structure['2']?). Perhaps an accessor function? i.e.
> Let me know if you think I should write get_model() or something along
> those lines.
I left another nitpick on b453a, but besides that it looks exactly right to me.
The string/int distinction would indeed be weird, especially for newer
analogue for get_model(serial_num) in the other Entities (Residue,
Chain, Model, Structure), so I'm inclined to put off the decision for
now (i.e. leave it out of this patch set).
More information about the Biopython-dev