[Biopython-dev] Questions on StructureBuilder, MMCIFParser, and MMCIFlex

Paul B tallpaulinjax at yahoo.com
Mon Nov 2 22:03:51 UTC 2009


Hi Peter,
 
I have attached drafts of MMCIFlex.py and MMCIFParser.py. They have __main__ methods that perform decent testing.  On my system, I have replaced their same-named counterparts  in the appropriate folders. Please note, however, this version of MMCIFlex.py and MMCIFParser.py must work together as a pair! So, I don't know how you guys handling that: give them new names, or replace old files?
 
I can't test them further right now because I believe MMCIFParser needs corrections. For example, the PDBParser.py calls the following methods in it's StructureBuilder object:
structure_builder.init_structure
structure_builder.set_header
structure_builder.set_line_counter
structure_builder.init_model
structure_builder.init_seg
structure_builder.init_chain
structure_builder.init_residue
structure_builder.init_atom
structure_builder.set_anisou
structure_builder.set_siguij
structure_builder.set_sigatm

 
However, MMCIFParser only calls:
structure_builder.init_structure
structure_builder.init_model
structure_builder.init_seg
structure_builder.init_chain
structure_builder.init_residue
structure_builder.init_atom
structure_builder.set_anisou

 
leaving out calls to:
structure_builder.set_header
structure_builder.set_line_counter
structure_builder.set_siguij
structure_builder.set_sigatm

 
I believe the last two might be important for some people, I don't know about the first two whether they are housekeeping, etc... still checking. So I am still looking into MMCIFParser, in particular why it's bombing creating a structure on 2beg.cif when PDBParser correctly works on pdb2beg.ent.
 
Paul

--- On Mon, 11/2/09, Paul B <tallpaulinjax at yahoo.com> wrote:


From: Paul B <tallpaulinjax at yahoo.com>
Subject: Re: [Biopython-dev] Questions on StructureBuilder, MMCIFParser, and MMCIFlex
To: biopython-dev at biopython.org
Date: Monday, November 2, 2009, 8:21 AM







I'll use the conventional response technique in future emails! :-)
 
Hi Peter,
 
1. "Did you mean to not CC the list?": Sorry, I replied to your email 
address instead of the CC: address! 
2. Peter: "I should be able to run the flex code and you new code side by side,
for testing and profiling. Note sure when I'll find the time exactly, but
we'll see. Examples will help as while I know plenty about PDB files,
I've not used CIF at all": I'd be glad to run the tests myself as well 
and I have the time! :-) But without the flex module installed and 
operational the only way I can think of is with pickle'd .cif dicts.
3. Peter: "P.S. Are you OK with making this contribution under the Biopython
license?" Absolutely I'd be glad to contribute to biopython!
 
This was in response to my followup email to Peter:
"Hi Peter:

Paul: So I re-wrote MMCIFlex strictly in Python to emulate (the lex based MMCIFlex)

Peter: Now that would be very handy (IMO), if you can get it working.
Have you benchmarked it against the flex code? Have you been able 
to test the flex code? If not, could you give me a tiny script using the 
2beg cif file which should work? If that works, then the problem is in 
your flex replacement code.

Paul: It already works, but I have no way to benchmark it against the
flex code myself. Perhaps someone could pickle a half dozen PDB .cif files and 
send me the resultant files? I can then run a test agains each one. 
I'll also clean up the code on both the new MMCIFlex.py as well as the 
changed MMCIF2Dict.py and send them to you most probably by today. 
Each will have a __main__ method for testing."
 

--- On Sun, 11/1/09, Peter <biopython at maubp.freeserve.co.uk> wrote:


From: Peter <biopython at maubp.freeserve.co.uk>
Subject: Re: [Biopython-dev] Questions on StructureBuilder, MMCIFParser, and MMCIFlex
To: "Paul B" <tallpaulinjax at yahoo.com>
Cc: biopython-dev at biopython.org
Date: Sunday, November 1, 2009, 4:28 PM


On Sun, Nov 1, 2009 at 7:50 PM, Paul B <tallpaulinjax at yahoo.com> wrote:
>
> Hi,
>
> I'm a computer science guy trying to figure out some chemistry logic
> to support my thesis, so bear with me! :-) To sum it up, I'm not sure
> MMCIFParser is handling ATOM and MODEL records correctly
> because of this code in MMCIFParser:
>             if fieldname=="HETATM":
>                 hetatm_flag="H"
>             else:
>                 hetatm_flag=" "
> This causes ATOM (and potentially MODEL) records to die as seen
> in the exception below (I think!)

I'll answer that below.

> My questions are:
> 1. Am I correct the correct code is insufficient?
> 2. What additional logic beyond just recognizing whether it's a
> HETATM, ATOM or MODEL record needs to be added?
>
> Thanks!
>
> Paul
>
>
> Background:
> I understand MMCIFlex.py et cetera is commented out in the
> Windows setup.py package due to difficulties compiling it.

It is commented out (on all platforms) because we don't know
how to get setup.py to detect if flex and the relevant headers
are installed, which we would need to compile the code. I'm
note sure how this would work on Windows with an installer
(i.e. what is a run time dependency versus compile time).

> So I re-wrote MMCIFlex strictly in Python to emulate what

Now that would be very handy (IMO), if you can get it working.
Have you benchmarked it against the flex code?

> I THINK the original MMCIFlex did. My version processes
> a .cif file one line at a time (readline()) then passes tokens
> back to MMCIF2Dict at  each call to get_token(). That
> seems to work fine for unit testing of my MMCIFlex and
> MMCIFDict which I had to slightly re-write (to ensure it
> handled how I passed SEMICOLONS line back etc).
>
> However when I try and use this with MMCIFParser
> against the 2beg .cif file which has no HETATM records
> and, as I understand the definition, no disordered atoms
> I get:
>
> ...
>
> Basically what I think MIGHT be happening is MMCIFParser
> is currently only handling HETATM records, when some other
> kind of record comes in (ATOM, MODEL) it is treated
> incorrectly. See below.
>
> ...

Have you been able to test the flex code? If not, could you
give me a tiny script using the 2beg cif file which should
work? If that works, then the problem is in your flex
replacement code.

Peter
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MMCIF2Dict.py
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20091102/78d51c70/attachment.ksh>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: MMCIFlex.py
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20091102/78d51c70/attachment-0001.ksh>


More information about the Biopython-dev mailing list