[Biojava-l] XML Parser changeover

Thomas Down td2@sanger.ac.uk
Tue, 15 Aug 2000 11:09:20 +0100


Hi...

BioJava is currently using Sun's old `crimson' XML parser
whenever XML data is used.  I'd like to propose that we drop
this and switch to the Apache group's Xerces-J parser for
the 1.1 release.

There are several reasons why it makes sense to make this
change now:

  - The Blast and PDB parsers developed by Simon Brocklehurst
    and CAT require the SAX2 API, included in Xerces but not
    crimson.

  - An increasing number of `external' XML formats (including
    GAME 0.3) now seem to be using namespaces, so it would be
    useful to have a namespace-aware XML parser, including
    support for the SAX2 and DOM level 2 APIs.

Right now, there isn't actually very much XML-dependant code
in BioJava -- the main use is in the AlphabetManager, which loads
the definitions for the DNA and Protein alphabets from an XML
document.  It should be easy to make the switch to Xerces now,
so that all future XML code will have access to the new APIs.

>From a user's point of view, the only visible change will be
that BioJava applications will need xerces.jar on the classpath
rather than xml.jar.

BioJava 1.0x releases will continue to use crimson (xml.jar). 

Are there any thoughts on this?  If not, I'd like to make the
change in the next day or so, so that we can check in the blast-
parsing code.

Happy hacking,
   Thomas.
-- 
One of the advantages of being disorderly is that one is
constantly making exciting discoveries.
                                       -- A. A. Milne