[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak

George Waldon gwaldon at geneinfinity.org
Thu Apr 26 22:03:42 UTC 2012


Hi Tjeerd,

You can certainly reopen a bug.

I am not the expert here, but looking at the code, RichObjectFactory  
uses a LRU cache as a temporary cache for objects that are reused  
often while SimpleRichObjectBuilder caches everything. I doubt you can  
remove this last one.

Maybe you can write your own RichObjectBuilder that would cache only  
things you are interested in (to compare?). I bet you generate a lot  
of DocRef that do not need to be cached; then use  
RichObjectFactory.setRichObjectBuilder to set your partial builder  
before you parse. Just an idea.

Hope this helps,
George

Quoting Tjeerd Boerman <twboerman at gmail.com>:

> Wow. I'm quite emberassed, but the issue is actually not fixed in  
> 1.8.2. The RichObjectFactory class uses an LRU cache, while the  
> SimpleRichObjectBuilder class uses the cache described in my earlier  
> post, and I mixed up the two. Sorry about that, I must have been  
> staring at this code for too long today.
>
> It turns out the RichObjectFactory class is the only user of  
> SimpleRichObjectBuilder, and has its own LRU cache. So the cache in  
> SimpleRichObjectBuilder can probably be removed altogether. If the  
> experts agree I would be happy to write a patch for this.
>
> Can the old bug report be reopened, or should I file a new bug  
> report? Or should I submit a possible patch through this mailing list?
>
> Regards,
> Tjeerd
>
> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>> Issue #3345 has been updated by Andreas Prlic.
>>
>>  * Status changed from New to Closed
>>  * % Done changed from 0 to 100
>>
>> already fixed in 1.8.2
>>
>> ------------------------------------------------------------------------
>>
>>
>>  Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>  memory leak <https://redmine.open-bio.org/issues/3345>
>>
>>  * Author: Tjeerd Boerman
>>  * Status: Closed
>>  * Priority: Normal
>>  * Assignee: biojava-dev list
>>  * Category: bio
>>  * Target version: BioJava 1.8 - legacy
>>  * URL:
>>
>> I encountered a memory problem when parsing many Genbank files with  
>> the Biojava 1.8.1. The parsed files were protein GPFF (GenPept Flat  
>> File format) files from the latest RefSeq release. The application  
>> tried to parse millions of protein sequences from these files, but  
>> an OutOfMemoryException would always occur after some time. The  
>> used heap space would gradually increase from a couple hundred  
>> megabytes to over 1.5 GB, until the heap could grow no further.  
>> Upon inspection I discovered a HashMap in RichSequenceBuilder was  
>> the culprit:
>>
>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>
>>     private static Map objects = new HashMap();
>>
>>     public Object buildObject(Class clazz, List paramsList) {
>>         ...
>>
>>         // return the constructed object from the hashmap if there already
>>         if (contents.containsKey(ourParamsList)) return  
>> contents.get(ourParamsList);
>>
>>         ...
>>
>>         // Instantiate it with the parameters
>>         Object o = c.newInstance(ourParamsList.toArray());
>>
>>         // store it for later in the singleton map
>>         contents.put(ourParamsList, o);
>>
>>         ...
>>     }
>> }
>>
>> It seems the *objects* Map in SimpleRichSequenceBuilder is used as  
>> a static cache for objects, but when many different objects are  
>> created this cache grows out of control. I am unsure if this is a  
>> 'true' bug, but for my application it was a definite problem. My  
>> fix was to simply comment out the *contents.put()* statement, but  
>> I'm sure there is a better way to resolve this - perhaps by making  
>> the use of the cache optional through a configuration option.
>>
>> ------------------------------------------------------------------------
>>
>> You have received this notification because you have either  
>> subscribed to it, or are involved in it.
>> To change your notification preferences, please click here and  
>> login: http://redmine.open-bio.org
>>
> _______________________________________________
> biojava-dev mailing list
> biojava-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>



--------------------------------
George Waldon




More information about the biojava-dev mailing list