Forcing deallocation of large cache object in Java

后端 未结 7 1006
醉梦人生
醉梦人生 2021-02-04 09:36

I use a large (millions) entries hashmap to cache values needed by an algorithm, the key is a combination of two objects as a long. Since it grows continuously (because keys in

相关标签:
7条回答
  • 2021-02-04 10:15

    Instead of using a HashMap or other map implementation as a cache you could try to use a framework specialized in caching. A well known caching framework for Java is Ehcache.

    Caching frameworks let you usually configure expiration policies based on time (e.g. time to live, time to idle) or usage (e.g. least frequently used, least recently used), some may even allow you to specify a maximum amount of memory usage.

    0 讨论(0)
  • 2021-02-04 10:19

    It sounds like you need a WeakHashMap instead:

    A hashtable-based Map implementation with weak keys. An entry in a WeakHashMap will automatically be removed when its key is no longer in ordinary use. More precisely, the presence of a mapping for a given key will not prevent the key from being discarded by the garbage collector, that is, made finalizable, finalized, and then reclaimed. When a key has been discarded its entry is effectively removed from the map, so this class behaves somewhat differently from other Map implementations.

    I'm not sure how this works with Long as keys, though. Also, this might be of interest:

    WeakHashMap is not a cache! Understanding WeakReference and SoftReference

    0 讨论(0)
  • 2021-02-04 10:24

    Clear the hashmap:

    hashmap.clear();
    

    Then force a garbage collector run:

    Runtime.getRuntime().gc();
    

    This is the Javadoc page for Runtime.gc().

    0 讨论(0)
  • 2021-02-04 10:29

    If you have a bit of spare memory you could implement a timout cache where each value in the hashmap contains your long value and an insersion timestamp in millis - then have a background thread iterate over the values every X seconds and remove anything more than X seconds/millis old.

    Just my 2 cents :)

    0 讨论(0)
  • 2021-02-04 10:30

    You can call HashMap.clear(). That will remove all data. Note that this will only discard all entries, but keep the internal array used to store the entries at the same size (rather than shrinking to an initial capacity). If you also need to eliminate that, the easiest way would be to discard the whole HashMap and replace it with a new instance. That of course only works if you control who has a pointer to the map.

    As for reclaiming the memory, you will have to let the garbage collector do its work.

    Are your values also Long? In this case, you may want to look at a more (memory-) efficient implementation than the generic HashMap, such as the TLongLongHashMap found in the GNU Trove library. That should save a lot of memory.

    0 讨论(0)
  • 2021-02-04 10:30

    For a memory-aware cache, you may want to use Apache Commons collections, in particular their org.apache.commons.collections.map.ReferenceMap class. The Java special operation is a soft reference. Java provides WeakHashMap for weak references, but weak references are not what you want for a cache. Java does not provide a SoftHashMap, but ReferenceMap from Apache Commons can be a workable substitute.

    Memory awareness of soft references is somewhat crude and inflexible. You can play with some Java options to somehow configure them, especially the -XX:SoftRefLRUPolicyMSPerMB value, which expresses (in milliseconds) how long soft-referenced values are kept in memory (when they cease to be directly reachable). For instance, with this:

    java -XX:SoftRefLRUPolicyMSPerMB=2500
    

    then the JVM will try to keep cached value for 2.5 seconds more than what it would have done with a WeakHashMap.

    If soft references do not provide what you are looking for, then you will have to implement your own cache strategy, and, indeed, flush the map manually. This is your initial question. For flushing, you can use the clear() method, or simply create a new HashMap. The difference should be slight, and you may even have trouble simply measuring that difference.

    Alternating between "full cache" and "empty cache" may also be considered as a bit crude, so you could maintain several maps. For instance, you maintain ten maps. When you look for a cached value, you look in all maps, but when you had a value, you put it in the first map only. When you want to flush, you rotate the maps: the first map becomes the second, the second becomes the third, and so on, up to the tenth map which is discarded. A new fresh first map is created. This would look like this:

    import java.util.*;
    
    public class Cache {
    
        private static final int MAX_SIZE = 500000;
    
        private Map[] backend;
        private int size = 0;
    
        public Cache(int n)
        {
            backend = new Map[n];
            for (int i = 0; i < n; i ++)
                backend[i] = new HashMap();
        }
    
        public int size()
        {
            return size;
        }
    
        public Object get(Object key)
        {
            for (Map m : backend) {
                if (m.containsKey(key))
                    return m.get(key);
            }
            return null;
        }
    
        public Object put(Object key, Object value)
        {
            if (backend[0].containsKey(key))
                return backend[0].put(key, value);
            int n = backend.length;
            for (int i = 1; i < n; i ++) {
                Map m = backend[i];
                if (m.containsKey(key)) {
                    Object old = m.remove(key);
                    backend[0].put(key, value);
                    return old;
                }
            }
            backend[0].put(key, value);
            size ++;
            while (size > MAX_SIZE) {
                size -= backend[n - 1].size();
                System.arraycopy(backend, 0, backend, 1, n - 1);
                backend[0] = new HashMap();
            }
            return null;
        }
    }
    

    The code above is completely untested, and should be enhanced with generics. However, it illustrates the main ideas: all maps are tested when reading (get()), all new values go to the first map, the total size is maintained, and when the size exceeds a given limit, maps are rotated. Note that there is some special treatment when a new value is put for a known key. Also, in this version, nothing special is done when finding a cached value, but we could "rejuvenate" accessed cached value: upon get(), when a value is found but not in the first map, it could be moved into the first map. Thus, frequently accessed values would remain cached forever.

    0 讨论(0)
提交回复
热议问题