[Biojava-l] Parsing blast result with a lot of hit

PhxGM Gim phxgm at hotmail.com
Thu Nov 4 21:06:49 EST 2004


what is the exact msg you are recieving from the JVM when it aborts? I'm 
*assuming* it's the standard "Out of Memory Exception." You can increase the 
heap size allocated to the JVM upon startup of the java application by 
throwing a few switches to the jvm invocation. there are complete tutorials 
on how to set the heap sizes for the jvms on the sun site at java.sun.com. i 
have used these to some degree of success when scaling java apps and hope it 
is applicable to your situation.
other than that you can certainly do something about having all those 
instances in memory at any one time, perhaps read them 'on demand' from 
storage. clearly you are going to have to solve the issue via additional 
resource allocations to the JVM or programmatically by reading data only as 
needed instead of loading all the data into memory. As I haven't encountered 
this particular issue in my development as of yet (with biojava) I do not 
know what constraints are imposed on developers ability to do this.
Again, I'm going to assume you have a Blast XML output file, which 
theoretically should be handled by either the BlastLikeSAXParser or the 
BlastXMLParser. Taken from the biojava docs on the BlastLikeSAXParser - "The 
biojava Blast-like parsing framework is designed to uses minimal memory,so 
that in principle, extremely large native outputs can be parsed and XML 
ContentHandlers can listen only for small amounts of information." 
(http://www.biojava.org/docs/api/org/biojava/bio/program/sax/BlastLikeSAXParser.html.) 
you can use an 'event driven' SAX parser ContentHandlers to trigger events 
caused by the XML document you're parsing. Again, it claims to scale... 
whether it does or not is another issue.

hope this has been of at least some help,

jess vermont
chicago

>From: "Lu Qiang" <luqiang at scbit.org>
>To: "biojava-l at biojava.org" <biojava-l at biojava.org>
>Subject: [Biojava-l] Parsing blast result with a lot of hit
>Date: Thu, 4 Nov 2004 18:42:20 +0000
>
>Hi, Guys,
>
>If we are tyring to parse a blast result with a lot of hits, the machine 
>will be crashed, for example 5000 sequences blast themselves.
>
>This must be caused by a ArrayList storing all results.
>
>How to solve this problem?
>
>regards,
>
>Lu
>
>
>_______________________________________________
>Biojava-l mailing list  -  Biojava-l at biojava.org
>http://biojava.org/mailman/listinfo/biojava-l

_________________________________________________________________
Express yourself instantly with MSN Messenger! Download today - it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



More information about the Biojava-l mailing list