[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak

George Waldon gwaldon at geneinfinity.org
Fri Apr 27 13:45:17 UTC 2012


You can use RichObjectFactory.setRichObjectBuilder(RichObjectBuilder  
b) to enable you own builder.

  -  George

Quoting Tjeerd Boerman <twboerman at gmail.com>:

> Hi George,
>
> A very good point. While the specification of the RichObjectBuilder  
> interface does not *enforce* that implementations persist the  
> objects, it seems to be the intended use. Both implementations  
> (BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all  
> created objects. Creating my own class would work for me, but this  
> should be fixed in BioJava codebase as well, correct? Creating  
> another RichObjectBuilder implementation that works in finite memory  
> would work, but it would have to be 'plugged in' by editing the  
> RichObjectFactory or enabled through a configuration option - both  
> seem undesirable. Any ideas?
>
> Tjeerd
>
> On 4/27/2012 6:44 AM, George Waldon wrote:
>> Hi Tjeerd,
>>
>> The use of a singleton map for rich objects with a Hibernate  
>> database is described in the biojavax documentation; see also the  
>> class BioSQLRichObjectBuilder.
>>
>> You can implement your solution simply by creating your own  
>> implementation of RichObjectBuilder as I mentioned before. Maybe  
>> you could store the cache values as WeakReference and recreated  
>> objects only when needed. Or you could filter the list of  
>> parameters, e.g. in the case of SimpleDocRef return the same  
>> singleton representing an empty DocRef if you are not interested in  
>> those, etc.
>>
>> - George
>>
>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>
>>> Hey,
>>>
>>> I cannot reopen the bug myself, but I'm sure Andreas or another  
>>> manager of the bugtracker can.
>>>
>>> For my purposes the memory usage could be kept in check by  
>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say  
>>> what the performance hit is, but it's not been noticeable in my  
>>> application. A permanent solution needs some more thought.
>>>
>>> I double-checked, and all calls to SimpleObjectBuilder.getObject()  
>>> go through RichObjectFactory.getObject() - it effectively  
>>> delegates the call to the SimpleObjectBuilder. The  
>>> SimpleObjectBuilder class keeps a cache of unrestricted size, with  
>>> one object for every unique combination of class and parameters.  
>>> The RichObjectFactory class also keeps a cache of objects, but  
>>> this cache is restricted to containing instances for at most 20  
>>> different object classes at any given time.
>>> So we can think of this as a level 1 cache in RichObjectFactory,  
>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache may  
>>> contain at most 20 entries, and the level 2 cache may contain at  
>>> most the amount of instantiable classes in BioJava, which will be  
>>> a couple hundred *maximum*. I don't think this difference  
>>> justifies the use of two caches.
>>>
>>> Possible fix:
>>> Unless I missed something, removing the cache from  
>>> SimpleObjectBuilder and increasing the size of the LRU cache in  
>>> RichObjectFactory to compensate seems like a good option. This  
>>> would solve the unbounded growth of the level 2 cache by  
>>> eliminating it.
>>>
>>>
>>>
>>> There is another issue with the level 1 cache that could cause  
>>> unbounded growth. Look at this piece of code in  
>>> RichObjectFactory.getObject():
>>>
>>>   Map m = (Map)cache.get(applicationClass);
>>>   if (!m.containsKey(paramsList)) {
>>>       m.put(paramsList,builder.buildObject(applicationClass, paramsList));
>>>   }
>>>
>>> For every unique combination of parameters, the object is built  
>>> and stored in the Map associated with that class. This Map is not  
>>> a LRU implementation, so creating 10,000,000 instances of a class  
>>> (i.e. SimpleDocRef) with different parameters would cause every  
>>> single instance to be saved in the cache. Such a situation would  
>>> again cause memory problems.
>>>
>>> Possible fix:
>>> Well that's an easy one! Replace this simple Map cache with a LRU  
>>> implementation, as proposed before.
>>>
>>> Regards,
>>> Tjeerd
>>>
>>>
>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>> Hi Tjeerd,
>>>>
>>>> You can certainly reopen a bug.
>>>>
>>>> I am not the expert here, but looking at the code,  
>>>> RichObjectFactory uses a LRU cache as a temporary cache for  
>>>> objects that are reused often while SimpleRichObjectBuilder  
>>>> caches everything. I doubt you can remove this last one.
>>>>
>>>> Maybe you can write your own RichObjectBuilder that would cache  
>>>> only things you are interested in (to compare?). I bet you  
>>>> generate a lot of DocRef that do not need to be cached; then use  
>>>> RichObjectFactory.setRichObjectBuilder to set your partial  
>>>> builder before you parse. Just an idea.
>>>>
>>>> Hope this helps,
>>>> George
>>>>
>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>
>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed  
>>>>> in 1.8.2. The RichObjectFactory class uses an LRU cache, while  
>>>>> the SimpleRichObjectBuilder class uses the cache described in my  
>>>>> earlier post, and I mixed up the two. Sorry about that, I must  
>>>>> have been staring at this code for too long today.
>>>>>
>>>>> It turns out the RichObjectFactory class is the only user of  
>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the cache  
>>>>> in SimpleRichObjectBuilder can probably be removed altogether.  
>>>>> If the experts agree I would be happy to write a patch for this.
>>>>>
>>>>> Can the old bug report be reopened, or should I file a new bug  
>>>>> report? Or should I submit a possible patch through this mailing  
>>>>> list?
>>>>>
>>>>> Regards,
>>>>> Tjeerd
>>>>>
>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>
>>>>>> * Status changed from New to Closed
>>>>>> * % Done changed from 0 to 100
>>>>>>
>>>>>> already fixed in 1.8.2
>>>>>>
>>>>>> ------------------------------------------------------------------------  Bug #3345: Static object cache in SimpleRichObjectBuilder  
>>>>>> causing
>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>
>>>>>> * Author: Tjeerd Boerman
>>>>>> * Status: Closed
>>>>>> * Priority: Normal
>>>>>> * Assignee: biojava-dev list
>>>>>> * Category: bio
>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>> * URL:
>>>>>>
>>>>>> I encountered a memory problem when parsing many Genbank files  
>>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF  
>>>>>> (GenPept Flat File format) files from the latest RefSeq  
>>>>>> release. The application tried to parse millions of protein  
>>>>>> sequences from these files, but an OutOfMemoryException would  
>>>>>> always occur after some time. The used heap space would  
>>>>>> gradually increase from a couple hundred megabytes to over 1.5  
>>>>>> GB, until the heap could grow no further. Upon inspection I  
>>>>>> discovered a HashMap in RichSequenceBuilder was the culprit:
>>>>>>
>>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>>
>>>>>>   private static Map objects = new HashMap();
>>>>>>
>>>>>>   public Object buildObject(Class clazz, List paramsList) {
>>>>>>       ...
>>>>>>
>>>>>>       // return the constructed object from the hashmap if there already
>>>>>>       if (contents.containsKey(ourParamsList)) return  
>>>>>> contents.get(ourParamsList);
>>>>>>
>>>>>>       ...
>>>>>>
>>>>>>       // Instantiate it with the parameters
>>>>>>       Object o = c.newInstance(ourParamsList.toArray());
>>>>>>
>>>>>>       // store it for later in the singleton map
>>>>>>       contents.put(ourParamsList, o);
>>>>>>
>>>>>>       ...
>>>>>>   }
>>>>>> }
>>>>>>
>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is used  
>>>>>> as a static cache for objects, but when many different objects  
>>>>>> are created this cache grows out of control. I am unsure if  
>>>>>> this is a 'true' bug, but for my application it was a definite  
>>>>>> problem. My fix was to simply comment out the *contents.put()*  
>>>>>> statement, but I'm sure there is a better way to resolve this -  
>>>>>> perhaps by making the use of the cache optional through a  
>>>>>> configuration option.
>>>>>>
>>>>>> ------------------------------------------------------------------------ You have received this notification because you have either subscribed to it, or are involved in  
>>>>>> it.
>>>>>> To change your notification preferences, please click here and  
>>>>>> login: http://redmine.open-bio.org
>>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------
>>>> George Waldon
>>>>
>>>
>>
>>
>>
>




More information about the biojava-dev mailing list