[Biojava-l] Parsing MegaBLAST output files?

James Diggans jdiggans at excelsiortech.com
Tue Nov 23 00:08:02 EST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Thanks for the reply, Mark. Setting the parser to be lazy (just before
the parse; it shouldn't matter where I do this as long as it's prior to
the parse, correct?) doesn't seem to help -- I still get the same SAX
exception. The MegaBLAST output seems, to my eye, to be identical to
that of blastn minus the header line:

	MEGABLAST 2.2.10 [Oct-19-2004]

Looking at the code for BlastLikeSAXParser, it seems, even in lazy mode,
to require that the header line contain at least a name with which it is
familiar (lazy just turns off interest in the version number). Would a
fix be as simple as adding 'MEGABLAST' to the list of acceptable names?
I can provide any interested dev w/ a sample output file from the
above-mentioned version of MegaBLAST.

If no one's interested, I'll follow up but it'll take me a lot longer
than those already familiar w/ the BioJava parser code.

Thanks all,
- -j

mark.schreiber at group.novartis.com wrote:
| Hello -
|
| MegaBLAST is not offcially supported. This doesn't mean it won't work it
| just means we don't know if it will work. If it isn't too different from
| normal blast it probably will.
|
| The BlastLikeSAXParser has two modes. Lazy and Strict. If you call
| setModeLazy() before parsing it won't care if it doesn't recognise the
| format as one that is tried and tested and will attempt to parse it
| anyway. You should carefully check a few results though to make sure
it is
| going well. If things work let us know so we can add MegaBLAST to the
list
| of trusted programs.
|
| Hope this helps,
|
| Mark
|
|
| James Diggans <jdiggans at excelsiortech.com>
| Sent by: biojava-l-bounces at portal.open-bio.org
| 11/22/2004 02:38 PM
|
|
|         To:     BioJava <biojava-l at biojava.org>
|         cc:     (bcc: Mark Schreiber/GP/Novartis)
|         Subject:        [Biojava-l] Parsing MegaBLAST output files?
|
|
|
|
| All, I'm attempting to use BioJava to parse the output from NCBI's
| commandline MegaBLAST and receiving an error:
|
| 'Could not recognise the format of this file as one supported by the
| framework.'
|
| in a SAXException thrown by BlastLikeSAXParser. An old post to the
| mailing list:
|
| http://www.biojava.org/pipermail/biojava-dev/2002-October/000150.html
|
| seems to indicate that this was fixed long ago via this commit to CVS:
|
|
http://cvs.biojava.org/cgi-bin/viewcvs/viewcvs.cgi/biojava-live/src/org/biojava/bio/program/ssbind/HeaderStAXHandler.java.diff?r1=1.3&r2=1.4&cvsroot=biojava
|
| The MegaBLAST file I'm trying to parse is clean and my attempt at a
| parse consists of (largely pulled from the recipe from BioJava in Anger):
|
| ------------------
| InputStream is = new FileInputStream(blastResult);
|
| BlastLikeSAXParser parser = new BlastLikeSAXParser();
| SeqSimilarityAdapter adapter = new SeqSimilarityAdapter();
| parser.setContentHandler(adapter);
|
| alignmentResults = new ArrayList();
| SearchContentHandler builder = new
|                  BlastLikeSearchBuilder(alignmentResults,
| ~                new DummySequenceDB("queries"),
|                                  new DummySequenceDBInstallation());
|
| adapter.setSearchContentHandler(builder);
|
| parser.parse(new InputSource(is));
| ------------------
|
| Any ideas on why I'm getting the SAXException? Thanks ...
| -j
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3-nr1 (Windows XP)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBosWy75jgGJzUhNkRAtL+AJ9V6JoMXSdT1AWPuFGMckUiMzFO5ACg2D1r
2R75Y4ElTIBxrMA+Pukgre0=
=Is3P
-----END PGP SIGNATURE-----


More information about the Biojava-l mailing list