[Biojava-l] Adventures in BlastLikeSax parsing

Simon Brocklehurst simon.brocklehurst@CambridgeAntibody.com
Wed, 21 Mar 2001 11:03:32 +0000


"Schreiber, Mark" wrote:

> Hi,
>
> How much varience is there between blast versions and how well does the
> SaxLikeBlastParser cope? Is there a way to produce a "standard" blast
> output?
>

Hi,

That's BlastLikeSAXParser ;-)  There are significant differences in the output
for blastn, blastp, tblastx etc, and between NCBI Blast, and WU Blast.  The
details of precisely what versions are supported are in the JavaDocs (or at
least they're supposed to be!).

In terms of incremental numbered versions of blast (i.e. 2.0.11 etc), the
SAXParser should work with supported versions.  It might also work with later
versions of NCBI Blast (using the lazy mode) and have not yet come across any
problems.  The reason for the parser dealing with specific numbered versions is
that the authors of the software make no guarantees about the output staying
consistent across versions.

The *idea* is that parser copes perfectly with all of these differences in a
transparent manner.  That doesn't mean there aren't bugs of course - all I can
say is that it works for what we use it for.

> Is there a way to produce a "standard" blast output?

Not sure what you mean - one of the reasons for designing the system the way we
did was indeed to standardise the output produced by different programs.   So,
you can trivially  produce a "standard" XML output from the different pieces of
software.

We are intending to put in some XSLT stuff to allow conversion of the XML to
nicely
formatted HTML.

Not sure if that answers your question or not.

S.
--
Simon M. Brocklehurst, Ph.D.
Head of Bioinformatics & Advanced IS
Cambridge Antibody Technology
The Science Park, Melbourn, Cambridgeshire, UK
http://www.CambridgeAntibody.com/
mailto:simon.brocklehurst@CambridgeAntibody.com