[Biojava-l] parsing BLAST result

Richard Holland holland at eaglegenomics.com
Wed Jul 23 18:20:53 UTC 2008


Your hits consist of numerous sub-hits, which means that the hits
themselves don't contain meaningful data. You can get the sub-hits by
doing this:

        // existing code to list the hits
        for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
          SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
          System.out.print("\tmatch: "+hit.getSubjectID());
          System.out.print("\tSubSeqStart: "+hit.getSubjectStart());
          System.out.print("\tSubSeqStop:  "+hit.getSubjectEnd());
          System.out.println("\te score: "+hit.getEValue());

          // new code to get the subhits
          System.out.println("\t\t Subhits:");
          for (Iterator j = hit.getSubHits().iterator(); j.hasNext(); ) {
             SeqSimilaritySearchSubHit subhit =
(SeqSimilaritySearchSubHit)j.next();
             System.out.print("\t\tSubSeqStart: "+subhit.getSubjectStart());
             System.out.print("\t\tSubSeqStop:  "+subhit.getSubjectEnd());
             System.out.println("\t\te score: "+subhit.getEValue());
          }
        }


cheers,
Richard


2008/7/23 Charles Imbusch <charles at imbusch.net>:
> Hello,
>
> for a project I have to parse Blast output files. To do this I used the code
> provided on this page:
>
> http://biojava.org/wiki/BioJava:CookBook:Blast:Parser
>
> I'm interested in the start and stop positions of the subject I align with,
> so
> I adjusted the code a bit so that it looks like:
>
>       //list the hits
>       for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
>         SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
>         System.out.print("\tmatch: "+hit.getSubjectID());
>         System.out.print("\tSubSeqStart: "+hit.getSubjectStart());
>         System.out.print("\tSubSeqStop:  "+hit.getSubjectEnd());
>         System.out.println("\te score: "+hit.getEValue());
>       }
>
> I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look at
> the
> best hit:
> ...
> match: 48_scaffold.txt    SubSeqStart: 3320    SubSeqStop:  2952643    e
> score: 0.0
> ...
> The subject id is correct but the numbers are just nonsense. It should be
> 610956 for the start
> and 610367 for the end position.
>
> This doesn't happen will all Blast result files but with some. Is there a
> solution for that? How
> do you parse the Blast files?
>
> I just uploaded the Blast output to http://charles.imbusch.net/tmp/
>
> Any answer is appreciated.
>
> Cheers,
>  Charles
>
> _______________________________________________
> Biojava-l mailing list  -  Biojava-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-l
>



-- 
Richard Holland
Bioinformatics Software Developer
Eagle Genomics
http://www.eaglegenomics.com/



More information about the Biojava-l mailing list