[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak
George Waldon
gwaldon at geneinfinity.org
Fri Apr 27 13:45:17 UTC 2012
You can use RichObjectFactory.setRichObjectBuilder(RichObjectBuilder
b) to enable you own builder.
- George
Quoting Tjeerd Boerman <twboerman at gmail.com>:
> Hi George,
>
> A very good point. While the specification of the RichObjectBuilder
> interface does not *enforce* that implementations persist the
> objects, it seems to be the intended use. Both implementations
> (BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all
> created objects. Creating my own class would work for me, but this
> should be fixed in BioJava codebase as well, correct? Creating
> another RichObjectBuilder implementation that works in finite memory
> would work, but it would have to be 'plugged in' by editing the
> RichObjectFactory or enabled through a configuration option - both
> seem undesirable. Any ideas?
>
> Tjeerd
>
> On 4/27/2012 6:44 AM, George Waldon wrote:
>> Hi Tjeerd,
>>
>> The use of a singleton map for rich objects with a Hibernate
>> database is described in the biojavax documentation; see also the
>> class BioSQLRichObjectBuilder.
>>
>> You can implement your solution simply by creating your own
>> implementation of RichObjectBuilder as I mentioned before. Maybe
>> you could store the cache values as WeakReference and recreated
>> objects only when needed. Or you could filter the list of
>> parameters, e.g. in the case of SimpleDocRef return the same
>> singleton representing an empty DocRef if you are not interested in
>> those, etc.
>>
>> - George
>>
>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>
>>> Hey,
>>>
>>> I cannot reopen the bug myself, but I'm sure Andreas or another
>>> manager of the bugtracker can.
>>>
>>> For my purposes the memory usage could be kept in check by
>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say
>>> what the performance hit is, but it's not been noticeable in my
>>> application. A permanent solution needs some more thought.
>>>
>>> I double-checked, and all calls to SimpleObjectBuilder.getObject()
>>> go through RichObjectFactory.getObject() - it effectively
>>> delegates the call to the SimpleObjectBuilder. The
>>> SimpleObjectBuilder class keeps a cache of unrestricted size, with
>>> one object for every unique combination of class and parameters.
>>> The RichObjectFactory class also keeps a cache of objects, but
>>> this cache is restricted to containing instances for at most 20
>>> different object classes at any given time.
>>> So we can think of this as a level 1 cache in RichObjectFactory,
>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache may
>>> contain at most 20 entries, and the level 2 cache may contain at
>>> most the amount of instantiable classes in BioJava, which will be
>>> a couple hundred *maximum*. I don't think this difference
>>> justifies the use of two caches.
>>>
>>> Possible fix:
>>> Unless I missed something, removing the cache from
>>> SimpleObjectBuilder and increasing the size of the LRU cache in
>>> RichObjectFactory to compensate seems like a good option. This
>>> would solve the unbounded growth of the level 2 cache by
>>> eliminating it.
>>>
>>>
>>>
>>> There is another issue with the level 1 cache that could cause
>>> unbounded growth. Look at this piece of code in
>>> RichObjectFactory.getObject():
>>>
>>> Map m = (Map)cache.get(applicationClass);
>>> if (!m.containsKey(paramsList)) {
>>> m.put(paramsList,builder.buildObject(applicationClass, paramsList));
>>> }
>>>
>>> For every unique combination of parameters, the object is built
>>> and stored in the Map associated with that class. This Map is not
>>> a LRU implementation, so creating 10,000,000 instances of a class
>>> (i.e. SimpleDocRef) with different parameters would cause every
>>> single instance to be saved in the cache. Such a situation would
>>> again cause memory problems.
>>>
>>> Possible fix:
>>> Well that's an easy one! Replace this simple Map cache with a LRU
>>> implementation, as proposed before.
>>>
>>> Regards,
>>> Tjeerd
>>>
>>>
>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>> Hi Tjeerd,
>>>>
>>>> You can certainly reopen a bug.
>>>>
>>>> I am not the expert here, but looking at the code,
>>>> RichObjectFactory uses a LRU cache as a temporary cache for
>>>> objects that are reused often while SimpleRichObjectBuilder
>>>> caches everything. I doubt you can remove this last one.
>>>>
>>>> Maybe you can write your own RichObjectBuilder that would cache
>>>> only things you are interested in (to compare?). I bet you
>>>> generate a lot of DocRef that do not need to be cached; then use
>>>> RichObjectFactory.setRichObjectBuilder to set your partial
>>>> builder before you parse. Just an idea.
>>>>
>>>> Hope this helps,
>>>> George
>>>>
>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>
>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed
>>>>> in 1.8.2. The RichObjectFactory class uses an LRU cache, while
>>>>> the SimpleRichObjectBuilder class uses the cache described in my
>>>>> earlier post, and I mixed up the two. Sorry about that, I must
>>>>> have been staring at this code for too long today.
>>>>>
>>>>> It turns out the RichObjectFactory class is the only user of
>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the cache
>>>>> in SimpleRichObjectBuilder can probably be removed altogether.
>>>>> If the experts agree I would be happy to write a patch for this.
>>>>>
>>>>> Can the old bug report be reopened, or should I file a new bug
>>>>> report? Or should I submit a possible patch through this mailing
>>>>> list?
>>>>>
>>>>> Regards,
>>>>> Tjeerd
>>>>>
>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>
>>>>>> * Status changed from New to Closed
>>>>>> * % Done changed from 0 to 100
>>>>>>
>>>>>> already fixed in 1.8.2
>>>>>>
>>>>>> ------------------------------------------------------------------------ Bug #3345: Static object cache in SimpleRichObjectBuilder
>>>>>> causing
>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>
>>>>>> * Author: Tjeerd Boerman
>>>>>> * Status: Closed
>>>>>> * Priority: Normal
>>>>>> * Assignee: biojava-dev list
>>>>>> * Category: bio
>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>> * URL:
>>>>>>
>>>>>> I encountered a memory problem when parsing many Genbank files
>>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF
>>>>>> (GenPept Flat File format) files from the latest RefSeq
>>>>>> release. The application tried to parse millions of protein
>>>>>> sequences from these files, but an OutOfMemoryException would
>>>>>> always occur after some time. The used heap space would
>>>>>> gradually increase from a couple hundred megabytes to over 1.5
>>>>>> GB, until the heap could grow no further. Upon inspection I
>>>>>> discovered a HashMap in RichSequenceBuilder was the culprit:
>>>>>>
>>>>>> public class SimpleRichObjectBuilder implements RichObjectBuilder {
>>>>>>
>>>>>> private static Map objects = new HashMap();
>>>>>>
>>>>>> public Object buildObject(Class clazz, List paramsList) {
>>>>>> ...
>>>>>>
>>>>>> // return the constructed object from the hashmap if there already
>>>>>> if (contents.containsKey(ourParamsList)) return
>>>>>> contents.get(ourParamsList);
>>>>>>
>>>>>> ...
>>>>>>
>>>>>> // Instantiate it with the parameters
>>>>>> Object o = c.newInstance(ourParamsList.toArray());
>>>>>>
>>>>>> // store it for later in the singleton map
>>>>>> contents.put(ourParamsList, o);
>>>>>>
>>>>>> ...
>>>>>> }
>>>>>> }
>>>>>>
>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is used
>>>>>> as a static cache for objects, but when many different objects
>>>>>> are created this cache grows out of control. I am unsure if
>>>>>> this is a 'true' bug, but for my application it was a definite
>>>>>> problem. My fix was to simply comment out the *contents.put()*
>>>>>> statement, but I'm sure there is a better way to resolve this -
>>>>>> perhaps by making the use of the cache optional through a
>>>>>> configuration option.
>>>>>>
>>>>>> ------------------------------------------------------------------------ You have received this notification because you have either subscribed to it, or are involved in
>>>>>> it.
>>>>>> To change your notification preferences, please click here and
>>>>>> login: http://redmine.open-bio.org
>>>>>>
>>>>> _______________________________________________
>>>>> biojava-dev mailing list
>>>>> biojava-dev at lists.open-bio.org
>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>
>>>>
>>>>
>>>>
>>>> --------------------------------
>>>> George Waldon
>>>>
>>>
>>
>>
>>
>
More information about the biojava-dev
mailing list