[Biojava-l] Parsing a huge Blast File with Biojava
David Huen
smh1008 at cus.cam.ac.uk
Mon Nov 1 10:08:13 EST 2004
On Monday 01 Nov 2004 14:49, Can Gencer wrote:
> Hello everyone,
>
> We are trying to parse a quite large multiple BLAST results file (around
> 4GB), and the computer available has 1GB of RAM. However, when the code
> in the cookbook is used (
> "http://www.biojava.org/docs/bj_in_anger/BlastParser.htm"), using the
> BlastLikeSAXParser it will give out an OutOfMemory exception after a
> short while, and when I monitor the system during the parsing, I don't
> see the memory usage going up significantly. It is the
> parse(InputSource) method that throws the exception. Is there a way to
> solve this problem ?
>
This is probably not the answer you want but I'm parsing BLAST files at
least as large as yours without this problem using the BlastXMLParserFacade
class. Perhaps it may be a temporary workaround until someone who
understands the other parser responds, I certainly don't.
There is also a alpha/beta-quality parser filter framework that could
perhaps be used with the XML parser framework in CVS.
Regards,
David Huen
P.S. A number of fixes have gone into the XML parsing for NCBI Blastn (the
only part I use, the other parts may work too)software in CVS which may
make it workable for you now. In particular, the irritating DTD related
bug appears to be worked around.
More information about the Biojava-l
mailing list