[Biojava-l] Re: blast parsing and empty hits

Petri Pehkonen pehkonen@messi.uku.fi
Thu, 3 Oct 2002 18:39:24 +0300 (WET)


I had same problem earlier and when I tried to debug it, I noticed that
parser simply jumps over non matched query sequences in output file. So the non
matched data cannot be reached from your ContentHandler-class anyhow. Or
that is at least what I think...

In my case the problem was solved easily. That is because I had developed
graphical interface for executing BLAST and handling the results in many
ways (about wich I don't tell more now), so I also parsed the query file
into database before parsing actual output. So when you have all
BLAST-queries in one database table and matched queries in another, you
can easily get the non matched BLAST-queries by an SQL-query.

I also had strange errors with BLAST-parser when parsing very big output
files like 200 MB. I don't now remember the case specifically but it was
something like the stack was growing during parsing so big that the
memory of my computer got short. I didn't save any data into variables in
parser, so the growing of stack didn't come from that. I even take all the
program code off from my ContentHandler and even then the error occured.

The fix for the problem above was that I now split the output file in many
parts and parse them separately: the parser is working heavenly great ;)

Maybe I have made big work with this parser but it have still been easier
than writing my own parser.


*Petri Pehkonen
*Software Designer / Bioinformatics
*Inst. of Applied Biotechnology
*University of Kuopio
*P.O.B. 1627, Kuopio 70211 FINLAND
*Phone (+358)40 7668027
*E-mail  Petri.Pehkonen@uku.fi