[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak

Tjeerd Boerman twboerman at gmail.com
Fri May 11 19:15:27 UTC 2012


Hi,

I've fixed the memory issue for me personally, but it may be a problem 
for future users, hence the question.

As far I can see changing the default caching strategy will not break 
any contracts. It simply means taking a small (negligible?) performance 
hit in order to be able to run in a bounded memory space. Would you agree?

Tjeerd

On 5/11/2012 5:34 PM, George Waldon wrote:
> Hi Tjeerd,
>
> Bj1x was designed and written by Richard Holland; that would be the 
> person to ask directly I guess:-).
>
> It is not desirable to change behavior retroactively. Are you still 
> facing memory issues? Could you be more explicit on your problem?
>
> There are several ways we could adjust the current code without 
> breaking the original design; e.g. use of service providers or 
> implement a getRichObjectBuilder method to swap the builder for a 
> particular job. Just thoughts.
>
> - George
>
>
> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>
>> Any thoughts on this from the Biojava authors? I strongly believe 
>> that the default behavior of BioJava should not be "unbounded memory 
>> usage unless you implement your own RichObjectBuilder".
>>
>> - Tjeerd
>>
>> On 4/27/2012 3:45 PM, George Waldon wrote:
>>> You can use RichObjectFactory.setRichObjectBuilder(RichObjectBuilder 
>>> b) to enable you own builder.
>>>
>>> -  George
>>>
>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>
>>>> Hi George,
>>>>
>>>> A very good point. While the specification of the RichObjectBuilder 
>>>> interface does not *enforce* that implementations persist the 
>>>> objects, it seems to be the intended use. Both implementations 
>>>> (BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all 
>>>> created objects. Creating my own class would work for me, but this 
>>>> should be fixed in BioJava codebase as well, correct? Creating 
>>>> another RichObjectBuilder implementation that works in finite 
>>>> memory would work, but it would have to be 'plugged in' by editing 
>>>> the RichObjectFactory or enabled through a configuration option - 
>>>> both seem undesirable. Any ideas?
>>>>
>>>> Tjeerd
>>>>
>>>> On 4/27/2012 6:44 AM, George Waldon wrote:
>>>>> Hi Tjeerd,
>>>>>
>>>>> The use of a singleton map for rich objects with a Hibernate 
>>>>> database is described in the biojavax documentation; see also the 
>>>>> class BioSQLRichObjectBuilder.
>>>>>
>>>>> You can implement your solution simply by creating your own 
>>>>> implementation of RichObjectBuilder as I mentioned before. Maybe 
>>>>> you could store the cache values as WeakReference and recreated 
>>>>> objects only when needed. Or you could filter the list of 
>>>>> parameters, e.g. in the case of SimpleDocRef return the same 
>>>>> singleton representing an empty DocRef if you are not interested 
>>>>> in those, etc.
>>>>>
>>>>> - George
>>>>>
>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I cannot reopen the bug myself, but I'm sure Andreas or another 
>>>>>> manager of the bugtracker can.
>>>>>>
>>>>>> For my purposes the memory usage could be kept in check by 
>>>>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say 
>>>>>> what the performance hit is, but it's not been noticeable in my 
>>>>>> application. A permanent solution needs some more thought.
>>>>>>
>>>>>> I double-checked, and all calls to 
>>>>>> SimpleObjectBuilder.getObject() go through 
>>>>>> RichObjectFactory.getObject() - it effectively delegates the call 
>>>>>> to the SimpleObjectBuilder. The SimpleObjectBuilder class keeps a 
>>>>>> cache of unrestricted size, with one object for every unique 
>>>>>> combination of class and parameters. The RichObjectFactory class 
>>>>>> also keeps a cache of objects, but this cache is restricted to 
>>>>>> containing instances for at most 20 different object classes at 
>>>>>> any given time.
>>>>>> So we can think of this as a level 1 cache in RichObjectFactory, 
>>>>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache may 
>>>>>> contain at most 20 entries, and the level 2 cache may contain at 
>>>>>> most the amount of instantiable classes in BioJava, which will be 
>>>>>> a couple hundred *maximum*. I don't think this difference 
>>>>>> justifies the use of two caches.
>>>>>>
>>>>>> Possible fix:
>>>>>> Unless I missed something, removing the cache from 
>>>>>> SimpleObjectBuilder and increasing the size of the LRU cache in 
>>>>>> RichObjectFactory to compensate seems like a good option. This 
>>>>>> would solve the unbounded growth of the level 2 cache by 
>>>>>> eliminating it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> There is another issue with the level 1 cache that could cause 
>>>>>> unbounded growth. Look at this piece of code in 
>>>>>> RichObjectFactory.getObject():
>>>>>>
>>>>>>  Map m = (Map)cache.get(applicationClass);
>>>>>>  if (!m.containsKey(paramsList)) {
>>>>>>      m.put(paramsList,builder.buildObject(applicationClass, 
>>>>>> paramsList));
>>>>>>  }
>>>>>>
>>>>>> For every unique combination of parameters, the object is built 
>>>>>> and stored in the Map associated with that class. This Map is not 
>>>>>> a LRU implementation, so creating 10,000,000 instances of a class 
>>>>>> (i.e. SimpleDocRef) with different parameters would cause every 
>>>>>> single instance to be saved in the cache. Such a situation would 
>>>>>> again cause memory problems.
>>>>>>
>>>>>> Possible fix:
>>>>>> Well that's an easy one! Replace this simple Map cache with a LRU 
>>>>>> implementation, as proposed before.
>>>>>>
>>>>>> Regards,
>>>>>> Tjeerd
>>>>>>
>>>>>>
>>>>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>>>>> Hi Tjeerd,
>>>>>>>
>>>>>>> You can certainly reopen a bug.
>>>>>>>
>>>>>>> I am not the expert here, but looking at the code, 
>>>>>>> RichObjectFactory uses a LRU cache as a temporary cache for 
>>>>>>> objects that are reused often while SimpleRichObjectBuilder 
>>>>>>> caches everything. I doubt you can remove this last one.
>>>>>>>
>>>>>>> Maybe you can write your own RichObjectBuilder that would cache 
>>>>>>> only things you are interested in (to compare?). I bet you 
>>>>>>> generate a lot of DocRef that do not need to be cached; then use 
>>>>>>> RichObjectFactory.setRichObjectBuilder to set your partial 
>>>>>>> builder before you parse. Just an idea.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>> George
>>>>>>>
>>>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>>>
>>>>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed 
>>>>>>>> in 1.8.2. The RichObjectFactory class uses an LRU cache, while 
>>>>>>>> the SimpleRichObjectBuilder class uses the cache described in 
>>>>>>>> my earlier post, and I mixed up the two. Sorry about that, I 
>>>>>>>> must have been staring at this code for too long today.
>>>>>>>>
>>>>>>>> It turns out the RichObjectFactory class is the only user of 
>>>>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the 
>>>>>>>> cache in SimpleRichObjectBuilder can probably be removed 
>>>>>>>> altogether. If the experts agree I would be happy to write a 
>>>>>>>> patch for this.
>>>>>>>>
>>>>>>>> Can the old bug report be reopened, or should I file a new bug 
>>>>>>>> report? Or should I submit a possible patch through this 
>>>>>>>> mailing list?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Tjeerd
>>>>>>>>
>>>>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>>>>
>>>>>>>>> * Status changed from New to Closed
>>>>>>>>> * % Done changed from 0 to 100
>>>>>>>>>
>>>>>>>>> already fixed in 1.8.2
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------  
>>>>>>>>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>>>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>>>>
>>>>>>>>> * Author: Tjeerd Boerman
>>>>>>>>> * Status: Closed
>>>>>>>>> * Priority: Normal
>>>>>>>>> * Assignee: biojava-dev list
>>>>>>>>> * Category: bio
>>>>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>>>>> * URL:
>>>>>>>>>
>>>>>>>>> I encountered a memory problem when parsing many Genbank files 
>>>>>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF 
>>>>>>>>> (GenPept Flat File format) files from the latest RefSeq 
>>>>>>>>> release. The application tried to parse millions of protein 
>>>>>>>>> sequences from these files, but an OutOfMemoryException would 
>>>>>>>>> always occur after some time. The used heap space would 
>>>>>>>>> gradually increase from a couple hundred megabytes to over 1.5 
>>>>>>>>> GB, until the heap could grow no further. Upon inspection I 
>>>>>>>>> discovered a HashMap in RichSequenceBuilder was the culprit:
>>>>>>>>>
>>>>>>>>> public class SimpleRichObjectBuilder implements 
>>>>>>>>> RichObjectBuilder {
>>>>>>>>>
>>>>>>>>>  private static Map objects = new HashMap();
>>>>>>>>>
>>>>>>>>>  public Object buildObject(Class clazz, List paramsList) {
>>>>>>>>>      ...
>>>>>>>>>
>>>>>>>>>      // return the constructed object from the hashmap if 
>>>>>>>>> there already
>>>>>>>>>      if (contents.containsKey(ourParamsList)) return 
>>>>>>>>> contents.get(ourParamsList);
>>>>>>>>>
>>>>>>>>>      ...
>>>>>>>>>
>>>>>>>>>      // Instantiate it with the parameters
>>>>>>>>>      Object o = c.newInstance(ourParamsList.toArray());
>>>>>>>>>
>>>>>>>>>      // store it for later in the singleton map
>>>>>>>>>      contents.put(ourParamsList, o);
>>>>>>>>>
>>>>>>>>>      ...
>>>>>>>>>  }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is 
>>>>>>>>> used as a static cache for objects, but when many different 
>>>>>>>>> objects are created this cache grows out of control. I am 
>>>>>>>>> unsure if this is a 'true' bug, but for my application it was 
>>>>>>>>> a definite problem. My fix was to simply comment out the 
>>>>>>>>> *contents.put()* statement, but I'm sure there is a better way 
>>>>>>>>> to resolve this - perhaps by making the use of the cache 
>>>>>>>>> optional through a configuration option.
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------ 
>>>>>>>>> You have received this notification because you have either 
>>>>>>>>> subscribed to it, or are involved in it.
>>>>>>>>> To change your notification preferences, please click here and 
>>>>>>>>> login: http://redmine.open-bio.org
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------
>>>>>>> George Waldon
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>



More information about the biojava-dev mailing list