[Biojava-l] TokenParser.TPStreamParser

Thomas Down td2@sanger.ac.uk
Sun, 10 Jun 2001 15:12:03 +0100


On Sun, Jun 10, 2001 at 02:40:33PM +0100, David Huen wrote:
> The above appears to fubar when fed sequences with whitespace.
> Unfortunately, these are common with XML derived sequences.  Would anyone
> object to a modification such that whitespace characters are ignored
> rather than worthy of an exception?

As I recall, I wrote TPStreamParser to be compatible with
the existing TokenParser.  I'd actually be kind-of reluctant
to add whitespace ignoring at this level, because it effectively
means that you can /never/ use whitespace characters as tokens
(which is probably a very bad idea, but it still worries me a 
little to completely rule it out.).

How about the following alternative strategy:

I presume you're talking about driving a StreamParser from a
SAX or StAX event source.  The S[t]AX listener will recieve
arrays of characters.  You can then identify blocks of
non-whitespace within this array, and pass them to the
StreamParser.characters(char[], int, int) method.  No
need to copy the characters into another array or anything,
so it should be quite efficient.

The StreamParser interface was designed with this pattern
in mind.

Would this do what you want?

    Thomas.