[Biopython-dev] Fwd: Feature: Python implementation of MMCIF parser (#33)

Peter Cock p.j.a.cock at googlemail.com
Fri Apr 20 08:39:02 UTC 2012


I've had a quick look on GitHub and it isn't obvious to me how to get
pull request emails CC'd to our dev mailing list... but anyway, Lenna
has been busy:

Peter

---------- Forwarded message ----------
From: Lenna Peterson
<reply+i-4201999-d8628b2a34f52e923e8471a792110c2edfbe13a8-63959 at reply.github.com>
Date: Thu, Apr 19, 2012 at 11:35 PM
Subject: [biopython] Feature: Python implementation of MMCIF parser (#33)
To: Peter Cock <p.j.a.cock at googlemail.com>


I've written a PLY (Python lex-yacc) module that is superimposable
with the C MMCIF module.

I've also partially rewritten the C MMCIF module to be object-oriented.

### Changed files ###

* MMCIFlexmodule.c: Now object-oriented (open file in constructor,
close file in destructor, etc). Docstrings! Added file IO exception.
* MMCIF2Dict.py: Minor changes for new object oriented API
* MMCIFParser: Changed all uses of map() to list comprehensions (more
compatible with 3)

### New files ###

* MMCIFlex.py: PLY-based module for tokenizing input.

### What it needs ###
Addition of PLY dependency to setup.py.
I'm not quite sure how to handle this, as PLY wouldn't be necessary on
a platform with C Python. Thoughts? Which non-CPython implementations
are worth testing?


New C module tested on Python 2.6 on Mac OS X and Debian. I hope it
still works on Windows.
On my machine, the C module processes a 30,000 line test file in 10-15
ms; the Python module takes ~150 ms.

You can merge this Pull Request by running:

 git pull https://github.com/lennax/biopython MMCIF2

Or you can view, comment on it, or merge it online at:

 https://github.com/biopython/biopython/pull/33

-- Commit Summary --

* Ply test in progress.
* Quoted values with spaces are being broken.
* Removed hard inclusion of ply.
* Fixed quoted strings with spaces.
* Changed Parser call to 2Dict. Semicolons break.
* Changed Parser call to 2Dict. Semicolons break.
* Lexes full file w/o error, FIXME loops
* Tweak: comment handling
* Changed token "NAME" to "TAG"
* Using IUCr grammar. FIXME quote/semi
* Fixed quoted strings.
* Semicolon text field fixed, FIXME included \n
* Fixed semi newlines.
* non-eol temp fix, doesn't match single chars
* Lexes full CIF file with no noticed errors.
* Added timing.
* Added states to lexer.
* Lex loops into [header, [items], ...]; \d hacks.
* Enforced semicolon rule.
* Yacc works.
* Re-added values to lexer state 'loop'
* FIXME syntax error/hangs on full file.
* Lexer gathers values, added parse precedence.
* Minor lex cleanup.
* Testing exclusionary lex redo.
* Streamlined rules, no loop yet.
* Still won't yacc 30k line file.
* Merge branch 'master' of git://github.com/biopython/biopython into ply2
* Added __name__ __main__ check.
* Parser redo, still doesn't parse 30k line file.
* Added comments to tokenizer.
* Fixed lex module's callability from yacc.
* Fixed DATA token failure.
* Multiple improvements, still no 30k.
* Moved lexer arguments to constructor.
* Moved data input to constructor, added docs
* Validated to pep8.
* Merge branch 'master' of git://github.com/biopython/biopython into ply2
* Add MMCIF2Dict from ply branch.
* Remove flex header dependency of CIF parser.
* Update MMCIFParser call of MMCIF2Dict.
* PLY lexer works with MMCIF2Dict.
* Cleanup.
* Cleaned up import.
* Updated docstring.
* Subclassed dict.
* Restored MMCIFParser call to MMCIF2Dict.
* Removed main() from lex input.
* Restored newline.
* Fix C prototype warnings.
* Modifying python lexer to be substitutable w/ C.
* Make header for generated C.
* Import C lexer or Python lexer.
* Improvements and documentation.
* Uncomment GLOBAL token definition.
* PLY lexer and C lexer should be interchangeable.
* Improve error reporting of import.
* Turn on ply lex optimize.
* Call instance of Python lexer.
* Working on implementing class in C module.
* Start unit test for MMCIF.
* Minimal unit test for MMCIFParser.
* Revert to old generated C; manually added noyywrap
* Manually added function prototypes to generated C.
* Merge branch 'ply2' into dev
* Merge branch 'ply' into dev
* Merge branch 'c-dev' into dev
* Merge branch 'master' of git://github.com/biopython/biopython into dev
* Cleaning up old files.
* More cleanup.
* Merging Parser from MMCIFlex branch.
* Parser and unit test for PyCIFRW
* Python and C lexer APIs are now identical.
* Add copyright and license notices.
* Merge branch 'master' of git://github.com/biopython/biopython into dev
* Trying GnuWin32 flex-generated C.
* Win flex generated with new mmcif.lex
* GnuWin32 flex generated C, used dos2unix for CRLF
* Added correct author to flex C module.
* Merge branch 'master' of git://github.com/biopython/biopython into dev
* Merge branch 'master' of git://github.com/biopython/biopython into dev
* Change map() to list comprehensions for 3 compat.
* Renamed python lexer to match C module.
* Added file IO exception to C module.
* Tweak lexer module import.
* Prep Python CIF lexer for pull request.
* Whitespace tweaks.

-- File Changes --

M Bio/PDB/MMCIF2Dict.py (20)
M Bio/PDB/MMCIFParser.py (8)
A Bio/PDB/mmCIF/MMCIFlex.py (253)
M Bio/PDB/mmCIF/MMCIFlexmodule.c (122)

-- Patch Links --

 https://github.com/biopython/biopython/pull/33.patch
 https://github.com/biopython/biopython/pull/33.diff

---
Reply to this email directly or view it on GitHub:
https://github.com/biopython/biopython/pull/33




More information about the Biopython-dev mailing list