[Biojava-l] Bug, or Feature? In TokenParser.TPStreamParser

Thomas Down td2@sanger.ac.uk
Tue, 26 Jun 2001 14:30:23 +0100


On Tue, Jun 26, 2001 at 09:32:21AM -0400, Cox, Greg wrote:
> 	I ran across some non-intuitive behavior, and I wanted to check if I
> should update documentation or implementation.  In testing, I created an
> embl parser with an embl file former as the listener.  
> 	TokenParser.TPStreamParser.characters() calls addSymbols on the
> registered SeqIOListener.  TPStreamParser() hands a 256 Symbol array, where
> the unused spaces are set to null, but EmblFileFormer expects a completely
> full array, causing a null pointer exception. 
> 	I'd like to change TPStreamParser to trim the array before handing
> it out to the Listener.  Will this break anything?

When I designed the SeqIOListener interface, I followed
a pattern which is also seen in some of the Java IO code,
and in the SAX interfaces: passing an array, an offset, and
a length.  The idea is that is minimizes memory churn by
allowing the same array to be reused, even if, for a given
call, it's not completely full.

So right now, the bug is in EmblFileFormat (which is ignoring
the start and length parameters passed into addSymbols).  It's
a fairly minor change to fix this.

If there's really consensus that this is a bad idea, we should
change the SeqIOListener interface, rather than forcing implementations
to implement a stricter contract than that implied by the interface.

   Thomas.