[Biojava-dev] [Fwd: Re: [blast-help] XML output format]

James Diggans jdiggans at excelsiortech.com
Mon Dec 6 23:46:02 EST 2004


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


[Result below from the cool folks @ the NCBI help desk]
I suppose the following makes sense but I can't say it would've been the
tact I'd taken. Historical searching patterns don't seem much of an
excuse for completely invalid XML even against your own DTD. MegaBLAST
is much better behaved but, as it's the only one, it's not worth moving
any portion of BlastXMLParserFacade code around I imagine. I may still
bug folks to help me write one once I'm done w/ my current project as I
think it would be useful when NCBI eventually moves blastall to the
proper DTD.
- -j

- -------- Original Message --------
Subject: Re: [blast-help] XML output format
Date: Mon, 6 Dec 2004 14:55:31 -0500 (EST)
From: Susan Dombrowski <dombrows at ncbi.nlm.nih.gov>
To: James Diggans <jdiggans at excelsiortech.com>
CC: blast-help at ncbi.nlm.nih.gov
References: <41B3BDED.6000706 at excelsiortech.com>

Dear James,

Per one of the BLAST programmers,

"Historically the multiple query input in blastall was treated as a
request for multiple independent complete searches. On the other hand
megablast from its first implementation combines multiple queries into a
single concatenated sequence. Because of that, the XML output for
megablast was adjusted to allow for a single report for all queries
together. A similar change was not made for blastall XML output."

It is quite possible, though, that we may implement this in blastall in
the future, but at the moment I cannot say when.

Regards,

Susan

****************************
Susan M. Dombrowski, Ph.D.
NCBI/ NLM/ NIH
8600 Rockville Pike
Building 38A, 3S314-L
Bethesda, MD 20894


On Sun, 5 Dec 2004, James Diggans wrote:

|
| I recently downloaded the 2.2.10 release and have found that blastall
| and megablast (using the -m 3 output format parameter) produce different
| XML output. Blastall produces seperate, complete documents for each
| query sequence and concatenates them all together. Megablast makes use
| of the Iterator tag to instead group all output for multi-query searches
| together under a single, well-formed BlastOutput tag set.
|
| The latter is much more ideal from a parsing standpoint as I don't have
| to have a second post-processing step to clean up the output as I now
| must have with blastall to wrap the multiple, concatenated BlastOutput
| tag sets in a fake parent tag set and turn off validation against the
| NCBI DTD. Given this, I have two questions:
|
| 1) Why does megablast produce valid XML output while blastall does not?
|
| 2) Is NCBI currently working to unify the output formats for the -m 3
| option and, if so, when might we expect blastall to support use of the
| Iteration tagset?
|
| My thanks for all that you do,
| -j
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3-nr1 (Windows XP)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBtTWK75jgGJzUhNkRAuIvAJkB0McMRIaNvhejoHKfbG5JSTd/VQCgsk5N
x1IChLcPQM2yzdn042d9jW8=
=OCih
-----END PGP SIGNATURE-----


More information about the biojava-dev mailing list