[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak

Tjeerd Boerman twboerman at gmail.com
Fri Apr 27 10:57:14 UTC 2012


Hi George,

A very good point. While the specification of the RichObjectBuilder 
interface does not *enforce* that implementations persist the objects, 
it seems to be the intended use. Both implementations 
(BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all 
created objects. Creating my own class would work for me, but this 
should be fixed in BioJava codebase as well, correct? Creating another 
RichObjectBuilder implementation that works in finite memory would work, 
but it would have to be 'plugged in' by editing the RichObjectFactory or 
enabled through a configuration option - both seem undesirable. Any ideas?

Tjeerd

On 4/27/2012 6:44 AM, George Waldon wrote:
> Hi Tjeerd,
>
> The use of a singleton map for rich objects with a Hibernate database 
> is described in the biojavax documentation; see also the class 
> BioSQLRichObjectBuilder.
>
> You can implement your solution simply by creating your own 
> implementation of RichObjectBuilder as I mentioned before. Maybe you 
> could store the cache values as WeakReference and recreated objects 
> only when needed. Or you could filter the list of parameters, e.g. in 
> the case of SimpleDocRef return the same singleton representing an 
> empty DocRef if you are not interested in those, etc.
>
> - George
>
> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>
>> Hey,
>>
>> I cannot reopen the bug myself, but I'm sure Andreas or another 
>> manager of the bugtracker can.
>>
>> For my purposes the memory usage could be kept in check by disabling 
>> the cache in SimpleRichObjectBuilder. It's hard to say what the 
>> performance hit is, but it's not been noticeable in my application. A 
>> permanent solution needs some more thought.
>>
>> I double-checked, and all calls to SimpleObjectBuilder.getObject() go 
>> through RichObjectFactory.getObject() - it effectively delegates the 
>> call to the SimpleObjectBuilder. The SimpleObjectBuilder class keeps 
>> a cache of unrestricted size, with one object for every unique 
>> combination of class and parameters. The RichObjectFactory class also 
>> keeps a cache of objects, but this cache is restricted to containing 
>> instances for at most 20 different object classes at any given time.
>> So we can think of this as a level 1 cache in RichObjectFactory, and 
>> a level 2 cache in SimpleObjectBuilder. The level 1 cache may contain 
>> at most 20 entries, and the level 2 cache may contain at most the 
>> amount of instantiable classes in BioJava, which will be a couple 
>> hundred *maximum*. I don't think this difference justifies the use of 
>> two caches.
>>
>> Possible fix:
>> Unless I missed something, removing the cache from 
>> SimpleObjectBuilder and increasing the size of the LRU cache in 
>> RichObjectFactory to compensate seems like a good option. This would 
>> solve the unbounded growth of the level 2 cache by eliminating it.
>>
>>
>>
>> There is another issue with the level 1 cache that could cause 
>> unbounded growth. Look at this piece of code in 
>> RichObjectFactory.getObject():
>>
>>    Map m = (Map)cache.get(applicationClass);
>>    if (!m.containsKey(paramsList)) {
>>        m.put(paramsList,builder.buildObject(applicationClass, 
>> paramsList));
>>    }
>>
>> For every unique combination of parameters, the object is built and 
>> stored in the Map associated with that class. This Map is not a LRU 
>> implementation, so creating 10,000,000 instances of a class (i.e. 
>> SimpleDocRef) with different parameters would cause every single 
>> instance to be saved in the cache. Such a situation would again cause 
>> memory problems.
>>
>> Possible fix:
>> Well that's an easy one! Replace this simple Map cache with a LRU 
>> implementation, as proposed before.
>>
>> Regards,
>> Tjeerd
>>
>>
>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>> Hi Tjeerd,
>>>
>>> You can certainly reopen a bug.
>>>
>>> I am not the expert here, but looking at the code, RichObjectFactory 
>>> uses a LRU cache as a temporary cache for objects that are reused 
>>> often while SimpleRichObjectBuilder caches everything. I doubt you 
>>> can remove this last one.
>>>
>>> Maybe you can write your own RichObjectBuilder that would cache only 
>>> things you are interested in (to compare?). I bet you generate a lot 
>>> of DocRef that do not need to be cached; then use 
>>> RichObjectFactory.setRichObjectBuilder to set your partial builder 
>>> before you parse. Just an idea.
>>>
>>> Hope this helps,
>>> George
>>>
>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>
>>>> Wow. I'm quite emberassed, but the issue is actually not fixed in 
>>>> 1.8.2. The RichObjectFactory class uses an LRU cache, while the 
>>>> SimpleRichObjectBuilder class uses the cache described in my 
>>>> earlier post, and I mixed up the two. Sorry about that, I must have 
>>>> been staring at this code for too long today.
>>>>
>>>> It turns out the RichObjectFactory class is the only user of 
>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the cache in 
>>>> SimpleRichObjectBuilder can probably be removed altogether. If the 
>>>> experts agree I would be happy to write a patch for this.
>>>>
>>>> Can the old bug report be reopened, or should I file a new bug 
>>>> report? Or should I submit a possible patch through this mailing list?
>>>>
>>>> Regards,
>>>> Tjeerd
>>>>
>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>
>>>>> * Status changed from New to Closed
>>>>> * % Done changed from 0 to 100
>>>>>
>>>>> already fixed in 1.8.2
>>>>>
>>>>> ------------------------------------------------------------------------  
>>>>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>
>>>>> * Author: Tjeerd Boerman
>>>>> * Status: Closed
>>>>> * Priority: Normal
>>>>> * Assignee: biojava-dev list
>>>>> * Category: bio
>>>>> * Target version: BioJava 1.8 - legacy
>>>>> * URL:
>>>>>
>>>>> I encountered a memory problem when parsing many Genbank files 
>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF 
>>>>> (GenPept Flat File format) files from the latest RefSeq release. 
>>>>> The application tried to parse millions of protein sequences from 
>>>>> these files, but an OutOfMemoryException would always occur after 
>>>>> some time. The used heap space would gradually increase from a 
>>>>> couple hundred megabytes to over 1.5 GB, until the heap could grow 
>>>>> no further. Upon inspection I discovered a HashMap in 
>>>>> RichSequenceBuilder was the culprit:
>>>>>
>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>
>>>>>    private static Map objects = new HashMap();
>>>>>
>>>>>    public Object buildObject(Class clazz, List paramsList) {
>>>>>        ...
>>>>>
>>>>>        // return the constructed object from the hashmap if there 
>>>>> already
>>>>>        if (contents.containsKey(ourParamsList)) return 
>>>>> contents.get(ourParamsList);
>>>>>
>>>>>        ...
>>>>>
>>>>>        // Instantiate it with the parameters
>>>>>        Object o = c.newInstance(ourParamsList.toArray());
>>>>>
>>>>>        // store it for later in the singleton map
>>>>>        contents.put(ourParamsList, o);
>>>>>
>>>>>        ...
>>>>>    }
>>>>> }
>>>>>
>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is used as 
>>>>> a static cache for objects, but when many different objects are 
>>>>> created this cache grows out of control. I am unsure if this is a 
>>>>> 'true' bug, but for my application it was a definite problem. My 
>>>>> fix was to simply comment out the *contents.put()* statement, but 
>>>>> I'm sure there is a better way to resolve this - perhaps by making 
>>>>> the use of the cache optional through a configuration option.
>>>>>
>>>>> ------------------------------------------------------------------------ 
>>>>> You have received this notification because you have either 
>>>>> subscribed to it, or are involved in it.
>>>>> To change your notification preferences, please click here and 
>>>>> login: http://redmine.open-bio.org
>>>>>
>>>> _______________________________________________
>>>> biojava-dev mailing list
>>>> biojava-dev at lists.open-bio.org
>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>
>>>
>>>
>>>
>>> --------------------------------
>>> George Waldon
>>>
>>
>
>
>



More information about the biojava-dev mailing list