[Biojava-dev] [BioJava - Bug #3345] (Closed) Static object cache in SimpleRichObjectBuilder causing memory leak
Tjeerd Boerman
twboerman at gmail.com
Fri May 11 19:15:27 UTC 2012
Hi,
I've fixed the memory issue for me personally, but it may be a problem
for future users, hence the question.
As far I can see changing the default caching strategy will not break
any contracts. It simply means taking a small (negligible?) performance
hit in order to be able to run in a bounded memory space. Would you agree?
Tjeerd
On 5/11/2012 5:34 PM, George Waldon wrote:
> Hi Tjeerd,
>
> Bj1x was designed and written by Richard Holland; that would be the
> person to ask directly I guess:-).
>
> It is not desirable to change behavior retroactively. Are you still
> facing memory issues? Could you be more explicit on your problem?
>
> There are several ways we could adjust the current code without
> breaking the original design; e.g. use of service providers or
> implement a getRichObjectBuilder method to swap the builder for a
> particular job. Just thoughts.
>
> - George
>
>
> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>
>> Any thoughts on this from the Biojava authors? I strongly believe
>> that the default behavior of BioJava should not be "unbounded memory
>> usage unless you implement your own RichObjectBuilder".
>>
>> - Tjeerd
>>
>> On 4/27/2012 3:45 PM, George Waldon wrote:
>>> You can use RichObjectFactory.setRichObjectBuilder(RichObjectBuilder
>>> b) to enable you own builder.
>>>
>>> - George
>>>
>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>
>>>> Hi George,
>>>>
>>>> A very good point. While the specification of the RichObjectBuilder
>>>> interface does not *enforce* that implementations persist the
>>>> objects, it seems to be the intended use. Both implementations
>>>> (BioSQLRichObjectBuilder and SimpleRichObjectBuilder) persist all
>>>> created objects. Creating my own class would work for me, but this
>>>> should be fixed in BioJava codebase as well, correct? Creating
>>>> another RichObjectBuilder implementation that works in finite
>>>> memory would work, but it would have to be 'plugged in' by editing
>>>> the RichObjectFactory or enabled through a configuration option -
>>>> both seem undesirable. Any ideas?
>>>>
>>>> Tjeerd
>>>>
>>>> On 4/27/2012 6:44 AM, George Waldon wrote:
>>>>> Hi Tjeerd,
>>>>>
>>>>> The use of a singleton map for rich objects with a Hibernate
>>>>> database is described in the biojavax documentation; see also the
>>>>> class BioSQLRichObjectBuilder.
>>>>>
>>>>> You can implement your solution simply by creating your own
>>>>> implementation of RichObjectBuilder as I mentioned before. Maybe
>>>>> you could store the cache values as WeakReference and recreated
>>>>> objects only when needed. Or you could filter the list of
>>>>> parameters, e.g. in the case of SimpleDocRef return the same
>>>>> singleton representing an empty DocRef if you are not interested
>>>>> in those, etc.
>>>>>
>>>>> - George
>>>>>
>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>
>>>>>> Hey,
>>>>>>
>>>>>> I cannot reopen the bug myself, but I'm sure Andreas or another
>>>>>> manager of the bugtracker can.
>>>>>>
>>>>>> For my purposes the memory usage could be kept in check by
>>>>>> disabling the cache in SimpleRichObjectBuilder. It's hard to say
>>>>>> what the performance hit is, but it's not been noticeable in my
>>>>>> application. A permanent solution needs some more thought.
>>>>>>
>>>>>> I double-checked, and all calls to
>>>>>> SimpleObjectBuilder.getObject() go through
>>>>>> RichObjectFactory.getObject() - it effectively delegates the call
>>>>>> to the SimpleObjectBuilder. The SimpleObjectBuilder class keeps a
>>>>>> cache of unrestricted size, with one object for every unique
>>>>>> combination of class and parameters. The RichObjectFactory class
>>>>>> also keeps a cache of objects, but this cache is restricted to
>>>>>> containing instances for at most 20 different object classes at
>>>>>> any given time.
>>>>>> So we can think of this as a level 1 cache in RichObjectFactory,
>>>>>> and a level 2 cache in SimpleObjectBuilder. The level 1 cache may
>>>>>> contain at most 20 entries, and the level 2 cache may contain at
>>>>>> most the amount of instantiable classes in BioJava, which will be
>>>>>> a couple hundred *maximum*. I don't think this difference
>>>>>> justifies the use of two caches.
>>>>>>
>>>>>> Possible fix:
>>>>>> Unless I missed something, removing the cache from
>>>>>> SimpleObjectBuilder and increasing the size of the LRU cache in
>>>>>> RichObjectFactory to compensate seems like a good option. This
>>>>>> would solve the unbounded growth of the level 2 cache by
>>>>>> eliminating it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> There is another issue with the level 1 cache that could cause
>>>>>> unbounded growth. Look at this piece of code in
>>>>>> RichObjectFactory.getObject():
>>>>>>
>>>>>> Map m = (Map)cache.get(applicationClass);
>>>>>> if (!m.containsKey(paramsList)) {
>>>>>> m.put(paramsList,builder.buildObject(applicationClass,
>>>>>> paramsList));
>>>>>> }
>>>>>>
>>>>>> For every unique combination of parameters, the object is built
>>>>>> and stored in the Map associated with that class. This Map is not
>>>>>> a LRU implementation, so creating 10,000,000 instances of a class
>>>>>> (i.e. SimpleDocRef) with different parameters would cause every
>>>>>> single instance to be saved in the cache. Such a situation would
>>>>>> again cause memory problems.
>>>>>>
>>>>>> Possible fix:
>>>>>> Well that's an easy one! Replace this simple Map cache with a LRU
>>>>>> implementation, as proposed before.
>>>>>>
>>>>>> Regards,
>>>>>> Tjeerd
>>>>>>
>>>>>>
>>>>>> On 4/27/2012 12:03 AM, George Waldon wrote:
>>>>>>> Hi Tjeerd,
>>>>>>>
>>>>>>> You can certainly reopen a bug.
>>>>>>>
>>>>>>> I am not the expert here, but looking at the code,
>>>>>>> RichObjectFactory uses a LRU cache as a temporary cache for
>>>>>>> objects that are reused often while SimpleRichObjectBuilder
>>>>>>> caches everything. I doubt you can remove this last one.
>>>>>>>
>>>>>>> Maybe you can write your own RichObjectBuilder that would cache
>>>>>>> only things you are interested in (to compare?). I bet you
>>>>>>> generate a lot of DocRef that do not need to be cached; then use
>>>>>>> RichObjectFactory.setRichObjectBuilder to set your partial
>>>>>>> builder before you parse. Just an idea.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>> George
>>>>>>>
>>>>>>> Quoting Tjeerd Boerman <twboerman at gmail.com>:
>>>>>>>
>>>>>>>> Wow. I'm quite emberassed, but the issue is actually not fixed
>>>>>>>> in 1.8.2. The RichObjectFactory class uses an LRU cache, while
>>>>>>>> the SimpleRichObjectBuilder class uses the cache described in
>>>>>>>> my earlier post, and I mixed up the two. Sorry about that, I
>>>>>>>> must have been staring at this code for too long today.
>>>>>>>>
>>>>>>>> It turns out the RichObjectFactory class is the only user of
>>>>>>>> SimpleRichObjectBuilder, and has its own LRU cache. So the
>>>>>>>> cache in SimpleRichObjectBuilder can probably be removed
>>>>>>>> altogether. If the experts agree I would be happy to write a
>>>>>>>> patch for this.
>>>>>>>>
>>>>>>>> Can the old bug report be reopened, or should I file a new bug
>>>>>>>> report? Or should I submit a possible patch through this
>>>>>>>> mailing list?
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Tjeerd
>>>>>>>>
>>>>>>>> On 4/26/2012 6:01 PM, redmine at redmine.open-bio.org wrote:
>>>>>>>>> Issue #3345 has been updated by Andreas Prlic.
>>>>>>>>>
>>>>>>>>> * Status changed from New to Closed
>>>>>>>>> * % Done changed from 0 to 100
>>>>>>>>>
>>>>>>>>> already fixed in 1.8.2
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> Bug #3345: Static object cache in SimpleRichObjectBuilder causing
>>>>>>>>> memory leak <https://redmine.open-bio.org/issues/3345>
>>>>>>>>>
>>>>>>>>> * Author: Tjeerd Boerman
>>>>>>>>> * Status: Closed
>>>>>>>>> * Priority: Normal
>>>>>>>>> * Assignee: biojava-dev list
>>>>>>>>> * Category: bio
>>>>>>>>> * Target version: BioJava 1.8 - legacy
>>>>>>>>> * URL:
>>>>>>>>>
>>>>>>>>> I encountered a memory problem when parsing many Genbank files
>>>>>>>>> with the Biojava 1.8.1. The parsed files were protein GPFF
>>>>>>>>> (GenPept Flat File format) files from the latest RefSeq
>>>>>>>>> release. The application tried to parse millions of protein
>>>>>>>>> sequences from these files, but an OutOfMemoryException would
>>>>>>>>> always occur after some time. The used heap space would
>>>>>>>>> gradually increase from a couple hundred megabytes to over 1.5
>>>>>>>>> GB, until the heap could grow no further. Upon inspection I
>>>>>>>>> discovered a HashMap in RichSequenceBuilder was the culprit:
>>>>>>>>>
>>>>>>>>> public class SimpleRichObjectBuilder implements
>>>>>>>>> RichObjectBuilder {
>>>>>>>>>
>>>>>>>>> private static Map objects = new HashMap();
>>>>>>>>>
>>>>>>>>> public Object buildObject(Class clazz, List paramsList) {
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> // return the constructed object from the hashmap if
>>>>>>>>> there already
>>>>>>>>> if (contents.containsKey(ourParamsList)) return
>>>>>>>>> contents.get(ourParamsList);
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>> // Instantiate it with the parameters
>>>>>>>>> Object o = c.newInstance(ourParamsList.toArray());
>>>>>>>>>
>>>>>>>>> // store it for later in the singleton map
>>>>>>>>> contents.put(ourParamsList, o);
>>>>>>>>>
>>>>>>>>> ...
>>>>>>>>> }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> It seems the *objects* Map in SimpleRichSequenceBuilder is
>>>>>>>>> used as a static cache for objects, but when many different
>>>>>>>>> objects are created this cache grows out of control. I am
>>>>>>>>> unsure if this is a 'true' bug, but for my application it was
>>>>>>>>> a definite problem. My fix was to simply comment out the
>>>>>>>>> *contents.put()* statement, but I'm sure there is a better way
>>>>>>>>> to resolve this - perhaps by making the use of the cache
>>>>>>>>> optional through a configuration option.
>>>>>>>>>
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> You have received this notification because you have either
>>>>>>>>> subscribed to it, or are involved in it.
>>>>>>>>> To change your notification preferences, please click here and
>>>>>>>>> login: http://redmine.open-bio.org
>>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> biojava-dev mailing list
>>>>>>>> biojava-dev at lists.open-bio.org
>>>>>>>> http://lists.open-bio.org/mailman/listinfo/biojava-dev
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------
>>>>>>> George Waldon
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>
>
More information about the biojava-dev
mailing list