[Biojava-l] Fasta & EMBL feature table parsing

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Mon, 27 Nov 2000 15:29:55 +0000


Hi Keith,

First, yes the grass is greener with Java ;-)

The SAX2 event-based parsing framework is designed to be extensible (for
example as well as the blast/wu-blast/hmmer stuff, there is
proof-of-principle 3-D structure stuff which will be enhanced shortly).

I'm sure you're not alone about being confused - I don't think there is
enough documentation there to make it easy to get going on using the parsers
to build applications, let alone extending the system by writing new SAX
parsers.

I have been meaning to put up some more documentation and tutorials on the
biojava web site to make it easy for people to get going.  As a start on
this, I will try to get some UML class diagram stuff up late today.  This
should certainly help you figure out what classes can be reused.

The place to start with this kind of thing is to figure out exactly what
SAX2 events you will need to throw.  What this means is that you need to
work out what the XML format would be if your data was actually in XML
format, and then put together a XML DTD or Schema to describe it.

If you have any detailed questions, please feel free to drop a note to the
list and I will do my best to help.

Simon
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com