[Biojava-l] Parsing blast result with a lot of hit

Rahul Karnik rahul at genebrew.com
Fri Nov 5 11:54:23 EST 2004


Lu Qiang wrote:
> This must be caused by a ArrayList storing all results.

You have diagnosed the problem perfectly. The BlastLikeSearchBuilder
used in the BioJava in Anger example stores all the hits in an
ArrayList, which means that if you are parsing a large BLAST results
file, the whole of the file is effectively being stored in memory. The
better approach is to print the results to your output as you encounter
them. For this, you probably want to write your own implementation of
the SearchContentHandler interface (using BlastLikeSearchBuilder as a
guide) that outputs the results in the format you want, rather than
storing them in a List. Then replace BlastLikeSearchBuilder with your
own implementation.

Note that it is probably easier to up the memory available to Java, so
try that first if you haven't already. I would only recommend the
approach described above if you are running up against hardware limitations.

Thanks,
Rahul


More information about the Biojava-l mailing list