[Bioperl-l] XML parsers
Hilmar Lapp
hlapp at gnf.org
Mon Feb 3 09:54:03 EST 2003
Thanks for your input. Very comprehensive, very helpful. You may have
noticed that my working with XML has been limited ... -hilmar
On Monday, February 3, 2003, at 02:07 AM, Robin Berjon wrote:
> Hilmar Lapp wrote:
>> I know a couple people out there are using XML parsers for bioperl
>> modules (Heikki, ChrisM, I guess many more). There's a variety of
>> parser modules available from CPAN. In bioperl we currently have
>> dependencies on
>> XML::Parser
>> XML::Parser::PerlSAX
>> XML::Twig
>> (and XML::Writer, which I guess is not exactly for parsing ...).
>> - What is the XML parser that people generally prefer currently, and
>> if you don't mind to mention, why? (this doesn't have to be one of
>> the above)
>
> XML::Parser is more or less on the deprecation slope. It's still used
> in places as a low level thing but it is very strongly recommended to
> *not* use it for new development. Only small bugfixes and tests on new
> versions of expat are to be provided, it is likely that some of its
> somewhat larger bugs such as those found in namespace support will
> never be fixed. The reason for this is that its interface is dated,
> everything new uses PerlSAX 2.
>
> XML::Parser::PerlSAX is just as deprecated because it's a partial
> implementation of PerlSAX 1. PerlSAX 1 isn't compatible with PerlSAX 2
> (unless you insert a converter).
>
> This may give the impression that XML folks like making tools to
> better deprecate them later, but that's not so. PerlSAX 2 is stable,
> and while a 2.1 may happen at some point this year a 3.0 is not
> currently on the map. Were it to happen, backwards compatibility will
> be maintained. PerlSAX 2 is the version that was heavily advocated and
> advertised (if you aren't noticing that from this post ;) and the one
> for which the greatest number of tools were written.
>
> Using PerlSAX 2 you have a wide array of tools:
>
> - several XML parsers:
> XML::SAX::Expat, wrapping XML::Parser to make it behave as a SAX2
> stream, very useful if you have XML::Parser installed and don't want
> to worry about installing extra stuff
>
> XML::SAX::PurePerl, a pure Perl parser, best for portability (but
> *slow*)
>
> XML::LibXML::SAX, a parser built on top of libxml2
>
> - an XML parser factory, XML::SAX::ParserFactory. Using a simple
> interface, this will select a SAX parser amongst those that you have
> installed. Very useful to write portable code;
>
> - SAX parsers for non-XML data sources such as CSV, Excel, Perl data
> structures, directory trees...anything you want;
>
> - many SAX filters doing all sorts of manipulations (there are too
> many to list here, just look for XML::Filter::*);
>
> - a pipeline manager (and much more), XML::SAX::Machines which will
> make setting up a processing pipeline for SAX tools really simple and
> elegant;
>
> - a SAX Writer, XML::SAX::Writer which is a framework to write to XML
> as well as non-XML outputs.
>
> All of the above plug together with a high degree of interop, the
> combinations are endless :)
>
> Don't forget to check out Kip's articles in the xml.com "Perl & XML"
> column (to resume soon).
>
> And that's just for the (SAX) parsers.
>
> XML::Twig isn't a parser, it's a nice tree-based interface to XML
> data. It is very useful when you have a large document you wish to
> process one subtree at a time. In the same vein see
> XML::Filter::Dispatcher. Both are SAX-in, SAX-out.
>
> Another tree-based favourite is XML::LibXML. It exposes a DOM and
> requires the whole document to be in memory but it is very fast and is
> compatible with XML::LibXSLT, the interface to the best XSLT processor
> on earth.
>
> I can go on for a while, but it might be better focussed if you ask
> specific questions ;)
>
>> We have been experimenting here with XML::Simple and
>> XML::SAX::ParserFactory. The former provides a nice perl'ish view on
>> the DOM, but seems to be very slow. Has anyone played with those and
>> made experiences, positive and negative?
>
> XML::Simple is great for simple things (it makes them easy) but it
> does tend to break at some degree of complexity. Please note that it
> is not a view on the DOM. The DOM corresponds to a certain view on XML
> (different from other views such as the XPath view).
>
> --
> Robin Berjon <robin.berjon at expway.fr>
> Research Engineer, Expway http://expway.fr/
> 7FC0 6F5F D864 EFB8 08CE 8E74 58E6 D5DB 4889 2488
>
>
--
-------------------------------------------------------------
Hilmar Lapp email: lapp at gnf.org
GNF, San Diego, Ca. 92121 phone: +1-858-812-1757
-------------------------------------------------------------
More information about the Bioperl-l
mailing list