[Biojava-l] Parsing a huge Blast File with Biojava

David Huen smh1008 at cus.cam.ac.uk
Mon Nov 1 10:08:13 EST 2004


On Monday 01 Nov 2004 14:49, Can Gencer wrote:
> Hello everyone,
>
> We are trying to parse a quite large multiple BLAST results file (around
> 4GB), and the computer available has 1GB of RAM. However, when the code
> in the cookbook is used (
> "http://www.biojava.org/docs/bj_in_anger/BlastParser.htm"), using the
> BlastLikeSAXParser it will give out an OutOfMemory exception after a
> short while, and when I monitor the system during the parsing, I don't
> see the memory usage going up significantly. It is the
> parse(InputSource) method that throws the exception. Is there a way to
> solve this problem ?
>
This is probably not the answer you want but I'm parsing BLAST files at 
least as large as yours without this problem using the BlastXMLParserFacade 
class.  Perhaps it may be a temporary workaround until someone who 
understands the other parser responds, I certainly don't.

There is also a alpha/beta-quality parser filter framework that could 
perhaps be used with the XML parser framework in CVS.

Regards,
David Huen
P.S. A number of fixes have gone into the XML parsing for NCBI Blastn (the 
only part I use, the other parts may work too)software in CVS which may 
make it workable for you now.  In particular, the irritating DTD related 
bug appears to be worked around.


More information about the Biojava-l mailing list