[Biojava-l] HitSectionSAXParser and BlastLikeVersionSupport

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Thu, 07 Feb 2002 10:32:12 +0000


Roy Park wrote:

> Hi there,
>
> I'm trying to write a small program that reads in only the HIT section of
> BLAST output (i.e. starting with ">"), by instantiating a
> HitSectionSAXParser class, and I noticed that the BlastLikeVersionSupport
> variable oVersion needs to be already populated by the time the class is
> instantiated (e.g. with BlastLikeVersionSupport.iProgram = some value).
> Thing is, the iProgram member currently has a private scope, and there is no
> method available to set it to some other value (i.e. setProgram()).
>
> I know that from the look of the *SAXParser classes, they were written
> specifically only to process the full BLAST results.  Could we make all
> *SAXParser classes public, as well as make setters for the
> BlastLikeVersionSupport class?  Let me know what you think.  Thanks!

Hi Roy,

We may need to bounce this back and forth a bit to get a clear idea of what
you're trying to do.

I don't think making all the classes public is the way to go - we're trying to
keep things both easy to use *and* maintainable.  Hopefully we can find another
solution.

To recap on the essential idea of event-based parsing...  the beauty of
event-based parsing is that you need consume only the events you are interested
in.

Thus in your specific case (assuming you're dealing with real Blast output, and
not some format not supported by the framework) you about can get the
*functionality* you require by simply writing an appropriate ContentHandler that
deals with SAX messages relating only to the HIT section.

Does this way of working *not* meet your needs for some reason?

We were aware when doing putting in the "BlastLikeSAXPars"ing part of the
framework that whilst the design scales well to dealing with very large data
sets, it may not meet future performance needs.  And also that the particular
implementation of the parser may not meet all needs for other unforeseen
reasons.

Thus, if you look in the source of BlastSAXParser class, you will see reference
to the Builder pattern.   The idea here would be to pass in "part"
implementations where the sum of the parts define the whole SAXParser.  This
would provide an elegant way of configuring the parser on the fly to ignore
certain parts of the input.

We didn't put this in, mainly because there are other things to in life than
write parsers.  However, if we are not at a stage where some refactoring around
the idea of providing Builder type functionality for parsers is needed, we can
look at doing something along these lines.

What do you think - I may have completely missed the point of what you're trying
to do.

Simon
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com