[Biojava-l] Out of heap space during structure parsing.

Wed Mar 11 13:17:15 UTC 2009

I believe I have just answered my own question. Since I am just using BioJava as a parser (right now), I added the following line:

    try {
      pdbreader = new PDBFileReader();
      pdbreader.setPath(localFilePath);
      pdbreader.setAlignSeqRes(false); // added this line
      pdbreader.setAutoFetch(true);
     struc = pdbreader.getStructureById(pdbCode);

But I wonder how people handle this problem if they require sequence alignment?

Paul
--- On Wed, 3/11/09, Paul B <tallpaulinjax at yahoo.com> wrote:

From: Paul B <tallpaulinjax at yahoo.com>
Subject: Re: Out of heap space during structure parsing.
To: biojava-l at biojava.org
Date: Wednesday, March 11, 2009, 9:06 AM

Sorry, I sent this earlier and then I read that attaching a file can cause spam problems. So here is my netbeans.conf file inline showing  1024 Mg ram to be used for heap size:

# ${HOME} will be replaced by JVM user.home system property
# netbeans_default_userdir="${HOME}/.netbeans/6.5"
# Options used by NetBeans launcher by default, can be overridden by explicit
# command line switches:
netbeans_default_options="-J-Dorg.glassfish.v3.installRoot=\"C:\Program Files\glassfish-v3-prelude\" -J-Dcom.sun.aas.installRoot=\"C:\Program Files\glassfish-v2ur2\" -J-client -J-Xverify:none -J-Xss2m -J-Xms32m -J-XX:PermSize=32m -J-XX:MaxPermSize=200m -J-Dapple.laf.useScreenMenuBar=true -J-Dsun.java2d.noddraw=true"
# Note that a default -Xmx is selected for you automatically.
# You can find this value in var/log/messages.log file in your userdir.
# The automatically selected value can be overridden by specifying -J-Xmx here
# or on the command line.
# command line switches
netbeans_default_options="-J-Xms32m -J-Xmx1024m -J-XX:PermSize=32m -J-XX:MaxPermSize=96m -J-Xverify:none -J-Dapple.laf.useScreenMenuBar=true -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled -J-XX:+UseParNewGC"
# If you specify the heap size (-Xmx) explicitely, you may also want to enable
# Concurrent Mark & Sweep garbage collector. In such case add the following
# options to the netbeans_default_options:
# -J-XX:+UseConcMarkSweepGC -J-XX:+CMSClassUnloadingEnabled -J-XX:+CMSPermGenSweepingEnabled
# -J-XX:+UseParNewGC 
# (see http://wiki.netbeans.org/wiki/view/FaqGCPauses)
# Default location of JDK, can be overridden by using --jdkhome <dir>:
netbeans_jdkhome="C:\Program Files\Java\jdk1.6.0_06"
# Additional module clusters, using ${path.separator} (';' on Windows or ':' on Unix):
#netbeans_extraclusters="/absolute/path/to/cluster1:/absolute/path/to/cluster2"
# If you have some problems with detect of proxy settings, you may want to enable
# detect the proxy settings provided by JDK5 or higher.
# In such case add -J-Djava.net.useSystemProxies=true to the netbeans_default_options.

--- On Wed, 3/11/09, Paul B <tallpaulinjax at yahoo.com> wrote:

From: Paul B <tallpaulinjax at yahoo.com>
Subject: Out of heap space during structure parsing.
To: biojava-l at biojava.org
Date: Wednesday, March 11, 2009, 8:51 AM

Hi,

I am using BioJava 1.6.1 to parse PDB files. My machine has 2GB of RAM. I am using Netbeans 6.5 as my development environment with Java 1.6. My user-specific netbeans.conf file is attached, with a heap space of 1GB. The relevant BioJava code is below:

    try {
      pdbreader = new PDBFileReader();
      pdbreader.setPath(localFilePath);
      pdbreader.setAutoFetch(true); 
      struc = pdbreader.getStructureById(pdbCode);
    ...

Using this code, I had successfully parsed smaller PDB files like 2BEG and 1Q80. Then I tried to parse a slightly larger file 1FFK and received this message on the 'struc =' line:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at org.biojava.bio.alignment.NeedlemanWunsch.pairwiseAlignment(NeedlemanWunsch.java:411)
        at org.biojava.bio.alignment.NeedlemanWunsch.getAlignment(NeedlemanWunsch.java:315)
        at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:220)
        at org.biojava.bio.structure.io.SeqRes2AtomAligner.align(SeqRes2AtomAligner.java:140)
        at org.biojava.bio.structure.io.PDBFileParser.triggerEndFileChecks(PDBFileParser.java:2249)
        at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2155)
        at org.biojava.bio.structure.io.PDBFileParser.parsePDBFile(PDBFileParser.java:2013)
        at org.biojava.bio.structure.io.PDBFileReader.getStructureById(PDBFileReader.java:439)
        at biojavatest.PdbDemo.grabPdbFileStruc(PdbDemo.java:105)
        at biojavatest.PdbDemo.runTest(PdbDemo.java:67)
        at biojavatest.PdbDemo.main(PdbDemo.java:58)

Any suggestions? Is the problem specific to some deviation in 1FFK, or in BioJava's parser implementation? 

By the way, I am using BioJava simply as a parser, and I am then dumping the data into class objects of my own design and persisting them to a SQL Server database. As such, I don't need all the ATOM information held in memory. Perhaps there is a way to lazy load that information upon request?

Is there a development version of BioJava that's downloadable and offers a more memory efficient way of grabbing data?
Thanks,

Paul