[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak

George Waldon gwaldon at geneinfinity.org
Fri May 11 21:16:53 UTC 2012


Tjeerd,

It is not a bug, it is a design for a singleton map to interact with a  
database to actually address the issue you had in the first place. If  
you change it, hibernate sessions may not work properly or memory  
issues may arise if too many sequences are loaded at the same time  
because the current code also prevent duplicate objects.

What about submitting your own version of RichObjectBuilder with a  
clear description of when and how to use it?

- George

Quoting Tjeerd Boerman <twboerman at gmail.com>:

> Hi,
>
> I've fixed the memory issue for me personally, but it may be a  
> problem for future users, hence the question.
>
> As far I can see changing the default caching strategy will not  
> break any contracts. It simply means taking a small (negligible?)  
> performance hit in order to be able to run in a bounded memory  
> space. Would you agree?
>
> Tjeerd
>
> On 5/11/2012 5:34 PM, George Waldon wrote:
>> Hi Tjeerd,
>>
>> Bj1x was designed and written by Richard Holland; that would be the  
>> person to ask directly I guess:-).
>>
>> It is not desirable to change behavior retroactively. Are you still  
>> facing memory issues? Could you be more explicit on your problem?
>>
>> There are several ways we could adjust the current code without  
>> breaking the original design; e.g. use of service providers or  
>> implement a getRichObjectBuilder method to swap the builder for a  
>> particular job. Just thoughts.
>>
>> - George
>>
>>
>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>
>>> Any thoughts on this from the Biojava authors? I strongly believe  
>>> that the default behavior of BioJava should not be "unbounded  
>>> memory usage unless you implement your own RichObjectBuilder".
>>>
>>> - Tjeerd
>>>
>>> On 4/27/2012 3:45 PM, George Waldon wrote:
>>>> You can use  
>>>> RichObjectFactory.setRichObjectBuilder(RichObjectBuilder b) to  
>>>> enable you own builder.
>>>>
>>>> -  George
>>>>
>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>
>>>>> Hi George,
>>>>>
>>>>> A very good point. While the specification of the  
>>>>> RichObjectBuilder interface does not *enforce* that  
>>>>> implementations persist the objects, it seems to be the intended  
>>>>> use. Both implementations (BioSQLRichObjectBuilder and  
>>>>> SimpleRichObjectBuilder) persist all created objects. Creating  
>>>>> my own class would work for me, but this should be fixed in  
>>>>> BioJava codebase as well, correct? Creating another  
>>>>> RichObjectBuilder implementation that works in finite memory  
>>>>> would work, but it would have to be 'plugged in' by editing the  
>>>>> RichObjectFactory or enabled through a configuration option -  
>>>>> both seem undesirable. Any ideas?
>>>>>
>>>>> Tjeerd
>>>>>
>>>>> On 4/27/2012 6:44 AM, George Waldon wrote:
>>>>>> Hi Tjeerd,
>>>>>>
>>>>>> The use of a singleton map for rich objects with a Hibernate  
>>>>>> database is described in the biojavax documentation; see also  
>>>>>> the class BioSQLRichObjectBuilder.
>>>>>>
>>>>>> You can implement your solution simply by creating your own  
>>>>>> implementation of RichObjectBuilder as I mentioned before.  
>>>>>> Maybe you could store the cache values as WeakReference and  
>>>>>> recreated objects only when needed. Or you could filter the  
>>>>>> list of parameters, e.g. in the case of SimpleDocRef return the  
>>>>>> same singleton representing an empty DocRef if you are not  
>>>>>> interested in those, etc.
>>>>>>
>>>>>> - George
>>>>>>
>>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>>
>>>>>>> Hey,
>>>>>>>
>>>>>>> I cannot reopen the bug myself, but I'm sure Andreas or  
>>>>>>> another manager of the bugtracker can.
>>>>>>>
>>>>>>> For my purposes the memory usage could be kept in check by  
>>>>>>> disabling the cache in SimpleRichObjectBuilder. It's hard to  
>>>>>>> say what the performance hit is, but it's not been noticeable  
>>>>>>> in my application. A permanent solution needs some more thought.
>>>>>>>
>>>>>>> I double-checked, and all calls to  
>>>>>>> SimpleObjectBuilder.getObject() go through  
>>>>>>> RichObjectFactory.getObject() - it effectively delegates the  
>>>>>>> call to the SimpleObjectBuilder. The SimpleObjectBuilder class  
>>>>>>> keeps a cache of unrestricted size, with one object for every  
>>>>>>> unique combination of class and parameters. The  
>>>>>>> RichObjectFactory class also keeps a cache of objects, but  
>>>>>>> this cache is restricted to containing instances for at most  
>>>>>>> 20 different object classes at any given time.
>>>>>>> So we can think of this as a level 1 cache in  
>>>>>>> RichObjectFactory, and a level 2 cache in SimpleObjectBuilder.  
>>>>>>> The level 1 cache may contain at most 20 entries, and the  
>>>>>>> level 2 cache may contain at most the amount of instantiable  
>>>>>>> classes in BioJava, which will be a couple hundred *maximum*.  
>>>>>>> I don't think this difference justifies the use of two caches.
>>>>>>>
>>>>>>> Possible fix:
>>>>>>> Unless I missed something, removing the cache from  
>>>>>>> SimpleObjectBuilder and increasing the size of the LRU cache  
>>>>>>> in RichObjectFactory to compensate seems like a good option.  
>>>>>>> This would solve the unbounded growth of the level 2 cache by  
>>>>>>> eliminating it.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> There is another issue with the level 1 cache that could cause  
>>>>>>> unbounded growth. Look at this piece of code in  
>>>>>>> RichObjectFactory.getObject():
>>>>>>>
>>>>>>> Map m = (Map)cache.get(applicationClass);
>>>>>>> if (!m.containsKey(paramsList)) {
>>>>>>>     m.put(paramsList,builder.buildObject(applicationClass,  
>>>>>>> paramsList));
>>>>>>> }
>>>>>>>
>>>>>>> For every unique combination of parameters, the object is  
>>>>>>> built and stored in the Map associated with that class. This  
>>>>>>> Map is not a LRU implementation, so creating 10,000,000  
>>>>>>> instances of a class (i.e. SimpleDocRef) with different  
>>>>>>> parameters would cause every single instance to be saved in  
>>>>>>> the cache. Such a situation would again cause memory problems.
>>>>>>>
>>>>>>> Possible fix:
>>>>>>> Well that's an easy one! Replace this simple Map cache with a  
>>>>>>> LRU implementation, as proposed before.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Tjeerd
>>>>>>>
>>>>>>>
>>>>>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>>>>>> Hi Tjeerd,
>>>>>>>>
>>>>>>>> You can certainly reopen a bug.
>>>>>>>>
>>>>>>>> I am not the expert here, but looking at the code,  
>>>>>>>> RichObjectFactory uses a LRU cache as a temporary cache for  
>>>>>>>> objects that are reused often while SimpleRichObjectBuilder  
>>>>>>>> caches everything. I doubt you can remove this last one.
>>>>>>>>
>>>>>>>> Maybe you can write your own RichObjectBuilder that would  
>>>>>>>> cache only things you are interested in (to compare?). I bet  
>>>>>>>> you generate a lot of DocRef that do not need to be cached;  
>>>>>>>> then use RichObjectFactory.setRichObjectBuilder to set your  
>>>>>>>> partial builder before you parse. Just an idea.
>>>>>>>>
>>>>>>>> Hope this helps,
>>>>>>>> George
>>>>>>>>
>>>>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>>>>
>>>>>>>>> Wow. I'm quite emberassed, but the issue is actually not  
>>>>>>>>> fixed in 1.8.2. The RichObjectFactory class uses an LRU  
>>>>>>>>> cache, while the SimpleRichObjectBuilder class uses the  
>>>>>>>>> cache described in my earlier post, and I mixed up the two.  
>>>>>>>>> Sorry about that, I must have been staring at this code for  
>>>>>>>>> too long today.
>>>>>>>>>
>>>>>>>>> It turns out the RichObjectFactory class is the only user of  
>>>>>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the  
>>>>>>>>> cache in SimpleRichObjectBuilder can probably be removed  
>>>>>>>>> altogether. If the experts agree I would be happy to write a  
>>>>>>>>> patch for this.
>>>>>>>>>
>>>>>>>>> Can the old bug report be reopened, or should I file a new  
>>>>>>>>> bug report? Or should I submit a possible patch through this  
>>>>>>>>> mailing list?
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Tjeerd
>>>>>>>>>
>>>>>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>>>>>
>>>>>>>>>> * Status changed from New to Closed
>>>>>>>>>> * % Done changed from 0 to 100
>>>>>>>>>>
>>>>>>>>>> already fixed in 1.8.2
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------  Bug #3345: Static object cache in SimpleRichObjectBuilder  
>>>>>>>>>> causing
>>>>>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>>>>>
>>>>>>>>>> * Author: Tjeerd Boerman
>>>>>>>>>> * Status: Closed
>>>>>>>>>> * Priority: Normal
>>>>>>>>>> * Assignee: biojava-dev list
>>>>>>>>>> * Category: bio
>>>>>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>>>>>> * URL:
>>>>>>>>>>
>>>>>>>>>> I encountered a memory problem when parsing many Genbank  
>>>>>>>>>> files with the Biojava 1.8.1. The parsed files were protein  
>>>>>>>>>> GPFF (GenPept Flat File format) files from the latest  
>>>>>>>>>> RefSeq release. The application tried to parse millions of  
>>>>>>>>>> protein sequences from these files, but an  
>>>>>>>>>> OutOfMemoryException would always occur after some time.  
>>>>>>>>>> The used heap space would gradually increase from a couple  
>>>>>>>>>> hundred megabytes to over 1.5 GB, until the heap could grow  
>>>>>>>>>> no further. Upon inspection I discovered a HashMap in  
>>>>>>>>>> RichSequenceBuilder was the culprit:
>>>>>>>>>>
>>>>>>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>>>>>>
>>>>>>>>>> private static Map objects = new HashMap();
>>>>>>>>>>
>>>>>>>>>> public Object buildObject(Class clazz, List paramsList) {
>>>>>>>>>>     ...
>>>>>>>>>>
>>>>>>>>>>     // return the constructed object from the hashmap if  
>>>>>>>>>> there already
>>>>>>>>>>     if (contents.containsKey(ourParamsList)) return  
>>>>>>>>>> contents.get(ourParamsList);
>>>>>>>>>>
>>>>>>>>>>     ...
>>>>>>>>>>
>>>>>>>>>>     // Instantiate it with the parameters
>>>>>>>>>>     Object o = c.newInstance(ourParamsList.toArray());
>>>>>>>>>>
>>>>>>>>>>     // store it for later in the singleton map
>>>>>>>>>>     contents.put(ourParamsList, o);
>>>>>>>>>>
>>>>>>>>>>     ...
>>>>>>>>>> }
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is  
>>>>>>>>>> used as a static cache for objects, but when many different  
>>>>>>>>>> objects are created this cache grows out of control. I am  
>>>>>>>>>> unsure if this is a 'true' bug, but for my application it  
>>>>>>>>>> was a definite problem. My fix was to simply comment out  
>>>>>>>>>> the *contents.put()* statement, but I'm sure there is a  
>>>>>>>>>> better way to resolve this - perhaps by making the use of  
>>>>>>>>>> the cache optional through a configuration option.
>>>>>>>>>>
>>>>>>>>>> ------------------------------------------------------------------------ You have received this notification because you have either subscribed to it, or are involved in  
>>>>>>>>>> it.
>>>>>>>>>> To change your notification preferences, please click here  
>>>>>>>>>> and login: http://redmine.open-bio.org
>>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> biojava-dev mailing list
>>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --------------------------------
>>>>>>>> George Waldon
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>>
>>
>






More information about the biojava-dev mailing list