[Biopython-dev] XML parsing library for new modules

Fri May 1 12:28:06 UTC 2009

Eric;
Thanks for summarizing the issues. I know Peter is taking a few well
deserved days off but I suspect he will have some thoughts when he
returns. We'd love to hear the experience of others who have used
different python XML parsers.

My lean is towards ElementTree for reasons of code clarity. SAX
parsers require a lot of boilerplate style code. They also can be
tricky with nested elements; I always find myself using a lot of "if
in_tag; else if in_tag" style code. ElementTree eliminates a lot of
these issues which should result in easier to maintain code.

Brad

> I'm writing a parser for the PhyloXML format for Google Summer of Code this
> year, and as the name would imply, it requires parsing some large XML files.
> The existing modules in Biopython for parsing XML formats seem to use
> xml.sax in the standard library. In Python 2.5, a faster and more Pythonic
> parser was added to the standard lib: ElementTree (xml.etree), in
> pure-Python and C-enhanced flavors. How do you feel about each of these
> libraries as the basis for a new Biopython module?
> 
> Here are some interesting benchmarks:
> http://effbot.org/zone/celementtree.htm#benchmarks
> 
> The ElementTree library is also available as a standalone package,
> compatible back to Python 2.1, and the lxml package also offers an
> independent implementation. So maintaining compatibility with Python 2.4
> would require the availability of one of these third-party packages, and my
> code would try each of these imports in order:
> 
> from xml.etree import cElementTree as ElementTree
> from xml.etree import ElementTree
> # Separate lxml package
> from lxml.etree import ElementTree
> # Standalone elementtree package
> import cElementTree as ElementTree
> from elementtree import ElementTree
> 
> Then one day, when Python 2.4 is no longer supported, only the first two
> lines would be needed. (The second line is for sites that disable C
> extensions, like Google App Engine, or alternate Python implementations like
> Jython.)
> 
> Another option is xml.parsers.expat, but just Googling around, it appears
> that the Python zeitgeist is strongly in favor of xml.etree for new code.
> 
> Thoughts?
> 
> Thanks,
> Eric
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev