[Biopython-dev] Fwd: Feature: Python implementation of MMCIF parser (#33)

Lenna Peterson arklenna at gmail.com
Sat Apr 21 00:57:21 UTC 2012


On Fri, Apr 20, 2012 at 4:39 AM, Peter Cock <p.j.a.cock at googlemail.com> wrote:
> I've had a quick look on GitHub and it isn't obvious to me how to get
> pull request emails CC'd to our dev mailing list... but anyway, Lenna
> has been busy:
>
> Peter
>
> ---------- Forwarded message ----------
> From: Lenna Peterson
> <reply+i-4201999-d8628b2a34f52e923e8471a792110c2edfbe13a8-63959 at reply.github.com>
> Date: Thu, Apr 19, 2012 at 11:35 PM
> Subject: [biopython] Feature: Python implementation of MMCIF parser (#33)
> To: Peter Cock <p.j.a.cock at googlemail.com>
>
>
> I've written a PLY (Python lex-yacc) module that is superimposable
> with the C MMCIF module.
>
> I've also partially rewritten the C MMCIF module to be object-oriented.
>
> ### Changed files ###
>
> * MMCIFlexmodule.c: Now object-oriented (open file in constructor,
> close file in destructor, etc). Docstrings! Added file IO exception.
> * MMCIF2Dict.py: Minor changes for new object oriented API
> * MMCIFParser: Changed all uses of map() to list comprehensions (more
> compatible with 3)
>
> ### New files ###
>
> * MMCIFlex.py: PLY-based module for tokenizing input.
>
> ### What it needs ###
> Addition of PLY dependency to setup.py.
> I'm not quite sure how to handle this, as PLY wouldn't be necessary on
> a platform with C Python. Thoughts? Which non-CPython implementations
> are worth testing?
>
>
> New C module tested on Python 2.6 on Mac OS X and Debian. I hope it
> still works on Windows.
> On my machine, the C module processes a 30,000 line test file in 10-15
> ms; the Python module takes ~150 ms.


I've started testing the PLY lexer on PyPy. NumPyPy now implements
more functions needed by PDB; the only things I found to be missing
are random and linalg. This eliminates Superimposer, FragmentMapper,
and Vector.

I played around with trying to spoof "import numpy" to automatically
import numpypy (code here: https://gist.github.com/2432815) but I
don't think that's wise yet.

My last commit to this branch was a few changes to allow the MMCIF
parser to work on NumPy. PyPy won't run `setup.py test` due to global
numpy failure, but if I install this branch and `pypy test_MMCIF.py`,
it passes.

Anybody with more PyPy and/or package structuring experience have thoughts?

Lenna



More information about the Biopython-dev mailing list