[Biojava-l] parsing BLAST result

Charles Imbusch charles at imbusch.net
Wed Jul 23 09:40:30 UTC 2008


Hello,

for a project I have to parse Blast output files. To do this I used the code
provided on this page:

http://biojava.org/wiki/BioJava:CookBook:Blast:Parser

I'm interested in the start and stop positions of the subject I align 
with, so
I adjusted the code a bit so that it looks like:

        //list the hits
        for (Iterator k = result.getHits().iterator(); k.hasNext(); ) {
          SeqSimilaritySearchHit hit = (SeqSimilaritySearchHit)k.next();
          System.out.print("\tmatch: "+hit.getSubjectID());
          System.out.print("\tSubSeqStart: "+hit.getSubjectStart());
          System.out.print("\tSubSeqStop:  "+hit.getSubjectEnd());
          System.out.println("\te score: "+hit.getEValue());
        }

I execute "java BlastParserOriginal S2431-F.fasta.txt" and have a look 
at the
best hit:
...
match: 48_scaffold.txt    SubSeqStart: 3320    SubSeqStop:  2952643    e 
score: 0.0
...
The subject id is correct but the numbers are just nonsense. It should 
be 610956 for the start
and 610367 for the end position.

This doesn't happen will all Blast result files but with some. Is there 
a solution for that? How
do you parse the Blast files?

I just uploaded the Blast output to http://charles.imbusch.net/tmp/

Any answer is appreciated.

Cheers,
  Charles




More information about the Biojava-l mailing list