[Biojava-l] Blast Parser oddity

Wed Apr 7 10:21:32 EDT 2004

Hi,

The problem is likely with the Blastoutput and not the parsers.  This was addressed in the bioperl list in January, and biojava lists back in 2002.  Check out some of the postings at

http://www.biojava.org/pipermail/biojava-l/2002-March/002311.html
http://bioperl.org/pipermail/bioperl-l/2004-January/014749.html

There are more postings on the bioperl archive for January as well, if you need more information
http://bioperl.org/pipermail/bioperl-l/2004-January/

And note also Jason pointed out other difference

http://bioperl.org/pipermail/bioperl-l/2004-February/014769.html

HTH,

-mat

> -----Original Message-----
> From: biojava-l-bounces at portal.open-bio.org
> [mailto:biojava-l-bounces at portal.open-bio.org]On Behalf Of Christian
> Gruber
> Sent: Friday, April 02, 2004 7:20 AM
> To: biojava-l at open-bio.org
> Subject: [Biojava-l] Blast Parser oddity
> 
> 
> Hi!
> 
> I am currently evaluating the XML output of NCBI Blast, and 
> the ability 
> of BioJava to parse this output. For this purpose, I have 
> done twice the 
> identical blastp and blastn (i.e. the same sequence against the same 
> database with the same parameters), one time with the 
> standard output, 
> and one time with XML output ("-m 7"). I then parsed the files either 
> with BlastLikeSAXParser (original output), or with 
> BlastXMLParserFacade 
> (XML output) and compared the outcome. Surprisingly, I got 
> two different 
> results...
> 
> Here is a list of the fields that are different:
> 
> SeqSimilaritySearchResult:
>    Annotation:
>      databaseId
>      program
>      queryId
>      version
> 
> SeqSimilaritySearchHit:
>    subjectId
>    queryStrand
>    subjectStrand
>    Annotation:
>      subjectDescription
>      subjectId
> 
> 
> SeqSimilaritySearchSubHit:
>    queryStrand
>    subjectStrand
>    score
>    numberOfIdentities
>    numberOfPositives
>    percentageIdentity
>    score
> 
> These are all rather important fields, for example subjectId, the 
> description or score. After looking at it, I think that the output of 
> BlastLikeSAXParser is OK, but the one of BlastXMLParserFacade 
> is rotten.
> 
> What now? I think that the parsing results are supposed to be 
> identical 
> (as good as it gets), but changing the parser might break 
> existing code. 
> If it's OK for you, I'd like to volunteer to change 
> BlastXMLParserFacade 
> so that the outcome resembles more the one of BlastLikeSAXParser.
> 
> By the way, is there a guaranteed set of Annotation entries for these 
> different classes? For example, I find percentageIdentity, but no 
> percentagePositives.
> 
> Greetings,
> Christian
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at biojava.org
> http://biojava.org/mailman/listinfo/biojava-l
>