[Biojava-l] Final new CAT classes pre 1.1

Fri, 02 Feb 2001 22:20:20 +0000

Dear All,

As promised, I've just checked in a final set of classes and major mods I
wanted to get in before the feature freeze ahead of the 1.1 release.  The
changes are in the org.biojava.bio.program.sax package.

The idea behind the changes was to make it much easier and quicker for
people to write additional SAX-2 compliant SAXParsers in the future.

The changes/additions are essentially as follows:

o Enhancements to the abstract superclass from which biojava SAXParser
classes inherit
o Refactoring of the existing classes to fit these with changes (most
existing classes affected). Only the internals were changes, there were no
changes to public interfaces so this won't affect anything.
o A couple of tools (i.e. classes) to help SAXParser developers test/debug
their parsers conveniently/efficiently at the package level.

NB The documentation is way too light at the moment (just thought I'd get
that in before anyone else does!)

As a test of the modified design, I've just implemented a couple of new,
simple SAXParsers to see how long it took to write them:

o One for parsing Fasta formatted sequences  - FastaSequenceSAXParser
o One for parsing Clustal W alignments - ClustalWAlignmentSAXParser

Time to implement (errr... I mean hack) these from scratch was less than
an hour in total, so I think I count that as a success.

NB I've introduced at least one minor buglet in these changes that
probably people won't notice i.e. everything still pretty much seems to
work, at least in my hands.

I'll can't fix this tonight (gotta be out of here in 5 minutes) and have
no free time this weekend.  I expect some minor changes to our (CAT) code
as we go through the tidying up/bug-fixing process for releasing 1.1, and
I won't   There should improvements to the documentation (JavaDocs and
tutorials etc.)!

Anyhow, that means for BioJava 1.1, we will be providing (to varying
levels of completeness) SAX2-compliant, event-based parsing for:

- Data from the results of sequence/similarity searches -
  o NCBI Blast
  o WU-Blast
  o HMMER

- Macromolecular structure data -
  o PDB

-  Collections of sequences (aligned and unaligned)
  o Fasta formatted sequence data
  o Clustal W alignments (.aln format)

And, for all the above the semantic meaning of data is extensively (and,
sometimes, perhaps even slightly elegantly) described explicitly in the
models used to derive the SAX2 events.

As the tidying up process progresses, I'll get the corresponding XML
descriptions sorted out, either as DTD's, Schemas or both, to make it easy
for people to think about XML ContentHandlers.

's all,

S.
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com