Optimize memory usage of a collection of Strings in Java

后端 未结 7 1676
有刺的猬
有刺的猬 2021-01-04 21:21

I have a large number of name - value pairs (approx 100k) that I need to store in some sort of cache (say a hash map) where the value is a string with an average of about 30

相关标签:
7条回答
  • 2021-01-04 21:47

    String.intern is the obvious choice as Brian says. But if you don't want to intern across all the String in memory, you can use a Set to first see if the value is present. Here's untested code. You will have to work out removing from reverse map when removing from main

      class Map2<K, V> implements Map<K, V>
      {
        Map<K, V> _map = Maps.newHashMap();
        Set<V, V> _rev = Maps.newHashMap();
    
        V put(K k, V v) {
          if (_rev.containsKey(v)) {
            V prev = _rev.get(v);
            return _map.put(k, prev);
          } else {
            _rev.put(v, v);
            return _map.put(k,v);
          }
       }
    
    0 讨论(0)
  • 2021-01-04 21:54

    It somewhat depends upon how you are creating the String.

    One possible way is to use TreeSet that uses a Comparator that can compare existing Strings and the source of your new String. Use SortedSet.tailSet and an Iterator to find an existing String. Or alternatively NavigableSet.ceiling/floor or a TreeMap with a similar setup.

    I wrote a weblog entry about another technique to cache immutable objects (in particular Strings), but that is more suitable for smaller objects.

    String.intern has performance problems.

    0 讨论(0)
  • 2021-01-04 22:01

    Do not use String.intern (there have been various memory issues related to this through the years). instead, create your own cache, similar to String.intern. basically, you want a Map, where each key maps to itself. then, before caching any string, you "intern" it:

    private Map<String,WeakReference<String>> myInternMap = new WeakHashMap<String,,WeakReference<String>>();
    public String intern(String value) {
      synchronized(myInternMap) {
        WeakReference<String> curRef = myInternMap.get(value);
        String curValue = ((curRef != null) ? curRef.get() : null);
        if(curValue != null) {
          return curValue;
        }
    
        myInternMap.put(value, new WeakReference<String>(value));
        return value;
      }
    }
    

    note, you use weakreferences for the keys and values so that you don't keep references for strings which you are no longer using.

    0 讨论(0)
  • 2021-01-04 22:05

    String.intern() will help you here (most likely). It will resolve multiple instances of the same string down to one copy.

    EDIT: I suggested this would 'most likely' help. In what scenarios will it not ? Interning strings will have the effect of storing those interned string representations permanently. If the problem domain is a one-shot process, this may not be an issue. If it's a long running process (such as a web app) then you may well have a problem.

    I would hesitate to say never use interning (I would hesistate to say never do anything). However there are scenarios where it's not ideal.

    0 讨论(0)
  • 2021-01-04 22:09

    Do you actually need Strings, or do you just need any old CharSequence? If not, then consider implementing a "compact" CharSequence such as the one I suggest in the link.

    0 讨论(0)
  • 2021-01-04 22:12

    Agreed with others on not using String.intern(): once you've put a string there, it will never go away. Look to early revisions of Xerces for why this is a bad idea.

    A better solution is to use a WeakHashMap, wrapping the value in a WeakReference:

    private Map<String,WeakReference<String>> _map 
        = new WeakHashMap<String,WeakReference<String>>();
    
    public synchronized String intern(String str)
    {
        WeakReference<String> ref = _map.get(str);
        String s2 = (ref != null) ? ref.get() : null;
        if (s2 != null)
            return s2;
        str = new String(str);
        _map.put(str, new WeakReference(str));
        return str;
    }
    

    This code is from an article that I wrote on the Java reference objects. You'll find the explanation there.

    EDIT: need to create a new string here (and I'll update the article) because the original might be a substring of a far larger character array. I thought that was fixed around JDK 1.3, but apparently not (at least not in 1.5).

    0 讨论(0)
提交回复
热议问题