[Biojava-l] Blast Parser oddity
Christian.Gruber at biomax.com
Fri Apr 2 08:20:20 EST 2004
I am currently evaluating the XML output of NCBI Blast, and the ability
of BioJava to parse this output. For this purpose, I have done twice the
identical blastp and blastn (i.e. the same sequence against the same
database with the same parameters), one time with the standard output,
and one time with XML output ("-m 7"). I then parsed the files either
with BlastLikeSAXParser (original output), or with BlastXMLParserFacade
(XML output) and compared the outcome. Surprisingly, I got two different
Here is a list of the fields that are different:
These are all rather important fields, for example subjectId, the
description or score. After looking at it, I think that the output of
BlastLikeSAXParser is OK, but the one of BlastXMLParserFacade is rotten.
What now? I think that the parsing results are supposed to be identical
(as good as it gets), but changing the parser might break existing code.
If it's OK for you, I'd like to volunteer to change BlastXMLParserFacade
so that the outcome resembles more the one of BlastLikeSAXParser.
By the way, is there a guaranteed set of Annotation entries for these
different classes? For example, I find percentageIdentity, but no
More information about the Biojava-l