[Biojava-l] parsing BLAST result
Richard Holland
holland at eaglegenomics.com
Wed Jul 23 18:20:53 UTC 2008
Your hits consist of numerous sub-hits, which means that the hits
themselves don't contain meaningful data. You can get the sub-hits by
doing this:
// existing code to list the hits
for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
System.out.print("\tmatch: "+hit.getSubjectID());
System.out.print("\tSubSeqStart: "+hit.getSubjectStart());
System.out.print("\tSubSeqStop: "+hit.getSubjectEnd());
System.out.println("\te score: "+hit.getEValue());
// new code to get the subhits
System.out.println("\t\t Subhits:");
for (Iterator j = hit.getSubHits().iterator(); j.hasNext(); ) {
SeqSimilaritySearchSubHit subhit =
(SeqSimilaritySearchSubHit)j.next();
System.out.print("\t\tSubSeqStart: "+subhit.getSubjectStart());
System.out.print("\t\tSubSeqStop: "+subhit.getSubjectEnd());
System.out.println("\t\te score: "+subhit.getEValue());
}
}
cheers,
Richard
2008/7/23 Charles Imbusch <charles at imbusch.net>:
> Hello,
>
> for a project I have to parse Blast output files. To do this I used the code
> provided on this page:
>
> http://biojava.org/wiki/BioJava:CookBook:Blast:Parser
>
> I'm interested in the start and stop positions of the subject I align with,
> so
> I adjusted the code a bit so that it looks like:
>
> //list the hits
> for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
> SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
> System.out.print("\tmatch: "+hit.getSubjectID());
> System.out.print("\tSubSeqStart: "+hit.getSubjectStart());
> System.out.print("\tSubSeqStop: "+hit.getSubjectEnd());
> System.out.println("\te score: "+hit.getEValue());
> }
>
> I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look at
> the
> best hit:
> ...
> match: 48_scaffold.txt SubSeqStart: 3320 SubSeqStop: 2952643 e
> score: 0.0
> ...
> The subject id is correct but the numbers are just nonsense. It should be
> 610956 for the start
> and 610367 for the end position.
>
> This doesn't happen will all Blast result files but with some. Is there a
> solution for that? How
> do you parse the Blast files?
>
> I just uploaded the Blast output to http://charles.imbusch.net/tmp/
>
> Any answer is appreciated.
>
> Cheers,
> Charles
>
> _______________________________________________
> Biojava-l mailing list - Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>
--
Richard Holland
Bioinformatics Software Developer
Eagle Genomics
http://www.eaglegenomics.com/
More information about the Biojava-l
mailing list