[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak

Tjeerd Boerman twboerman at gmail.com
Fri May 11 13:42:19 UTC 2012


Any thoughts on this from the Biojava authors? I strongly believe that 
the default behavior of BioJava should not be "unbounded memory usage 
unless you implement your own RichObjectBuilder".

- Tjeerd

On 4/27/2012 3:45 PM, George Waldon wrote:
> You can use RichObjectFactory.setRichObjectBuilder(RichObjectBuilder 
> b) to enable you own builder.
>
>  -  George
>
> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>
>> Hi George,
>>
>> A very good point. While the specification of the RichObjectBuilder 
>> interface does not *enforce* that implementations persist the 
>> objects, it seems to be the intended use. Both implementations 
>> (BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all 
>> created objects. Creating my own class would work for me, but this 
>> should be fixed in BioJava codebase as well, correct? Creating 
>> another RichObjectBuilder implementation that works in finite memory 
>> would work, but it would have to be 'plugged in' by editing the 
>> RichObjectFactory or enabled through a configuration option - both 
>> seem undesirable. Any ideas?
>>
>> Tjeerd
>>
>> On 4/27/2012 6:44 AM, George Waldon wrote:
>>> Hi Tjeerd,
>>>
>>> The use of a singleton map for rich objects with a Hibernate 
>>> database is described in the biojavax documentation; see also the 
>>> class BioSQLRichObjectBuilder.
>>>
>>> You can implement your solution simply by creating your own 
>>> implementation of RichObjectBuilder as I mentioned before. Maybe you 
>>> could store the cache values as WeakReference and recreated objects 
>>> only when needed. Or you could filter the list of parameters, e.g. 
>>> in the case of SimpleDocRef return the same singleton representing 
>>> an empty DocRef if you are not interested in those, etc.
>>>
>>> - George
>>>
>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>
>>>> Hey,
>>>>
>>>> I cannot reopen the bug myself, but I'm sure Andreas or another 
>>>> manager of the bugtracker can.
>>>>
>>>> For my purposes the memory usage could be kept in check by 
>>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say 
>>>> what the performance hit is, but it's not been noticeable in my 
>>>> application. A permanent solution needs some more thought.
>>>>
>>>> I double-checked, and all calls to SimpleObjectBuilder.getObject() 
>>>> go through RichObjectFactory.getObject() - it effectively delegates 
>>>> the call to the SimpleObjectBuilder. The SimpleObjectBuilder class 
>>>> keeps a cache of unrestricted size, with one object for every 
>>>> unique combination of class and parameters. The RichObjectFactory 
>>>> class also keeps a cache of objects, but this cache is restricted 
>>>> to containing instances for at most 20 different object classes at 
>>>> any given time.
>>>> So we can think of this as a level 1 cache in RichObjectFactory, 
>>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache may 
>>>> contain at most 20 entries, and the level 2 cache may contain at 
>>>> most the amount of instantiable classes in BioJava, which will be a 
>>>> couple hundred *maximum*. I don't think this difference justifies 
>>>> the use of two caches.
>>>>
>>>> Possible fix:
>>>> Unless I missed something, removing the cache from 
>>>> SimpleObjectBuilder and increasing the size of the LRU cache in 
>>>> RichObjectFactory to compensate seems like a good option. This 
>>>> would solve the unbounded growth of the level 2 cache by 
>>>> eliminating it.
>>>>
>>>>
>>>>
>>>> There is another issue with the level 1 cache that could cause 
>>>> unbounded growth. Look at this piece of code in 
>>>> RichObjectFactory.getObject():
>>>>
>>>>   Map m = (Map)cache.get(applicationClass);
>>>>   if (!m.containsKey(paramsList)) {
>>>>       m.put(paramsList,builder.buildObject(applicationClass, 
>>>> paramsList));
>>>>   }
>>>>
>>>> For every unique combination of parameters, the object is built and 
>>>> stored in the Map associated with that class. This Map is not a LRU 
>>>> implementation, so creating 10,000,000 instances of a class (i.e. 
>>>> SimpleDocRef) with different parameters would cause every single 
>>>> instance to be saved in the cache. Such a situation would again 
>>>> cause memory problems.
>>>>
>>>> Possible fix:
>>>> Well that's an easy one! Replace this simple Map cache with a LRU 
>>>> implementation, as proposed before.
>>>>
>>>> Regards,
>>>> Tjeerd
>>>>
>>>>
>>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>>> Hi Tjeerd,
>>>>>
>>>>> You can certainly reopen a bug.
>>>>>
>>>>> I am not the expert here, but looking at the code, 
>>>>> RichObjectFactory uses a LRU cache as a temporary cache for 
>>>>> objects that are reused often while SimpleRichObjectBuilder caches 
>>>>> everything. I doubt you can remove this last one.
>>>>>
>>>>> Maybe you can write your own RichObjectBuilder that would cache 
>>>>> only things you are interested in (to compare?). I bet you 
>>>>> generate a lot of DocRef that do not need to be cached; then use 
>>>>> RichObjectFactory.setRichObjectBuilder to set your partial builder 
>>>>> before you parse. Just an idea.
>>>>>
>>>>> Hope this helps,
>>>>> George
>>>>>
>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>
>>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed in 
>>>>>> 1.8.2. The RichObjectFactory class uses an LRU cache, while the 
>>>>>> SimpleRichObjectBuilder class uses the cache described in my 
>>>>>> earlier post, and I mixed up the two. Sorry about that, I must 
>>>>>> have been staring at this code for too long today.
>>>>>>
>>>>>> It turns out the RichObjectFactory class is the only user of 
>>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the cache 
>>>>>> in SimpleRichObjectBuilder can probably be removed altogether. If 
>>>>>> the experts agree I would be happy to write a patch for this.
>>>>>>
>>>>>> Can the old bug report be reopened, or should I file a new bug 
>>>>>> report? Or should I submit a possible patch through this mailing 
>>>>>> list?
>>>>>>
>>>>>> Regards,
>>>>>> Tjeerd
>>>>>>
>>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>>
>>>>>>> * Status changed from New to Closed
>>>>>>> * % Done changed from 0 to 100
>>>>>>>
>>>>>>> already fixed in 1.8.2
>>>>>>>
>>>>>>> ------------------------------------------------------------------------  
>>>>>>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>>
>>>>>>> * Author: Tjeerd Boerman
>>>>>>> * Status: Closed
>>>>>>> * Priority: Normal
>>>>>>> * Assignee: biojava-dev list
>>>>>>> * Category: bio
>>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>>> * URL:
>>>>>>>
>>>>>>> I encountered a memory problem when parsing many Genbank files 
>>>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF 
>>>>>>> (GenPept Flat File format) files from the latest RefSeq release. 
>>>>>>> The application tried to parse millions of protein sequences 
>>>>>>> from these files, but an OutOfMemoryException would always occur 
>>>>>>> after some time. The used heap space would gradually increase 
>>>>>>> from a couple hundred megabytes to over 1.5 GB, until the heap 
>>>>>>> could grow no further. Upon inspection I discovered a HashMap in 
>>>>>>> RichSequenceBuilder was the culprit:
>>>>>>>
>>>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>>>
>>>>>>>   private static Map objects = new HashMap();
>>>>>>>
>>>>>>>   public Object buildObject(Class clazz, List paramsList) {
>>>>>>>       ...
>>>>>>>
>>>>>>>       // return the constructed object from the hashmap if there 
>>>>>>> already
>>>>>>>       if (contents.containsKey(ourParamsList)) return 
>>>>>>> contents.get(ourParamsList);
>>>>>>>
>>>>>>>       ...
>>>>>>>
>>>>>>>       // Instantiate it with the parameters
>>>>>>>       Object o = c.newInstance(ourParamsList.toArray());
>>>>>>>
>>>>>>>       // store it for later in the singleton map
>>>>>>>       contents.put(ourParamsList, o);
>>>>>>>
>>>>>>>       ...
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is used 
>>>>>>> as a static cache for objects, but when many different objects 
>>>>>>> are created this cache grows out of control. I am unsure if this 
>>>>>>> is a 'true' bug, but for my application it was a definite 
>>>>>>> problem. My fix was to simply comment out the *contents.put()* 
>>>>>>> statement, but I'm sure there is a better way to resolve this - 
>>>>>>> perhaps by making the use of the cache optional through a 
>>>>>>> configuration option.
>>>>>>>
>>>>>>> ------------------------------------------------------------------------ 
>>>>>>> You have received this notification because you have either 
>>>>>>> subscribed to it, or are involved in it.
>>>>>>> To change your notification preferences, please click here and 
>>>>>>> login: http://redmine.open-bio.org
>>>>>>>
>>>>>> _______________________________________________
>>>>>> biojava-dev mailing list
>>>>>> biojava-dev at lists.open-bio.org
>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --------------------------------
>>>>> George Waldon
>>>>>
>>>>
>>>
>>>
>>>
>>
>



More information about the biojava-dev mailing list