[Biojava-l] Parsing MegaBLAST output files?

James Diggans jdiggans at excelsiortech.com
Tue Nov 23 00:08:02 EST 2004

Thanks for the reply, Mark. Setting the parser to be lazy (just before
the parse; it shouldn't matter where I do this as long as it's prior to
the parse, correct?) doesn't seem to help -- I still get the same SAX
exception. The MegaBLAST output seems, to my eye, to be identical to
that of blastn minus the header line:

	MEGABLAST 2.2.10 [Oct-19-2004]

Looking at the code for BlastLikeSAXParser, it seems, even in lazy mode,
to require that the header line contain at least a name with which it is
familiar (lazy just turns off interest in the version number). Would a
fix be as simple as adding 'MEGABLAST' to the list of acceptable names?
I can provide any interested dev w/ a sample output file from the
above-mentioned version of MegaBLAST.

If no one's interested, I'll follow up but it'll take me a lot longer
than those already familiar w/ the BioJava parser code.

mark.schreiber at group.novartis.com wrote:
| Hello -
| MegaBLAST is not offcially supported. This doesn't mean it won't work it
| just means we don't know if it will work. If it isn't too different from
| normal blast it probably will.
| The BlastLikeSAXParser has two modes. Lazy and Strict. If you call
| setModeLazy() before parsing it won't care if it doesn't recognise the
| format as one that is tried and tested and will attempt to parse it
| anyway. You should carefully check a few results though to make sure
it is
| going well. If things work let us know so we can add MegaBLAST to the
| of trusted programs.
| Hope this helps,
| Mark
| All, I'm attempting to use BioJava to parse the output from NCBI's
| commandline MegaBLAST and receiving an error:
| 'Could not recognise the format of this file as one supported by the
| framework.'
| in a SAXException thrown by BlastLikeSAXParser. An old post to the
| mailing list:
| http://www.biojava.org/pipermail/biojava-dev/2002-October/000150.html
| seems to indicate that this was fixed long ago via this commit to CVS:
| The MegaBLAST file I'm trying to parse is clean and my attempt at a
| parse consists of (largely pulled from the recipe from BioJava in Anger):
| ------------------
| InputStream is = new FileInputStream(blastResult);
| BlastLikeSAXParser parser = new BlastLikeSAXParser();
| SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
| parser.setContentHandler(adapter);
| alignmentResults = new ArrayList();
| SearchContentHandler builder = new
|                  BlastLikeSearchBuilder(alignmentResults,
| ~                new DummySequenceDB("queries"),
|                                  new DummySequenceDBInstallation());
| adapter.setSearchContentHandler(builder);
| parser.parse(new InputSource(is));
| ------------------
| Any ideas on why I'm getting the SAXException? Thanks ...
| -j
