[Biojava-l] XML Parser changeover
Thomas Down
td2@sanger.ac.uk
Tue, 15 Aug 2000 11:09:20 +0100
Hi...
BioJava is currently using Sun's old `crimson' XML parser
whenever XML data is used. I'd like to propose that we drop
this and switch to the Apache group's Xerces-J parser for
the 1.1 release.
There are several reasons why it makes sense to make this
change now:
- The Blast and PDB parsers developed by Simon Brocklehurst
and CAT require the SAX2 API, included in Xerces but not
crimson.
- An increasing number of `external' XML formats (including
GAME 0.3) now seem to be using namespaces, so it would be
useful to have a namespace-aware XML parser, including
support for the SAX2 and DOM level 2 APIs.
Right now, there isn't actually very much XML-dependant code
in BioJava -- the main use is in the AlphabetManager, which loads
the definitions for the DNA and Protein alphabets from an XML
document. It should be easy to make the switch to Xerces now,
so that all future XML code will have access to the new APIs.
>From a user's point of view, the only visible change will be
that BioJava applications will need xerces.jar on the classpath
rather than xml.jar.
BioJava 1.0x releases will continue to use crimson (xml.jar).
Are there any thoughts on this? If not, I'd like to make the
change in the next day or so, so that we can check in the blast-
parsing code.
Happy hacking,
Thomas.
--
One of the advantages of being disorderly is that one is
constantly making exciting discoveries.
-- A. A. Milne