I have a large number of name - value pairs (approx 100k) that I need to store in some sort of cache (say a hash map) where the value is a string with an average of about 30
String.intern
is the obvious choice as Brian says. But if you don't want to intern across all the String in memory, you can use a Set to first see if the value is present. Here's untested code. You will have to work out removing from reverse map when removing from main
class Map2<K, V> implements Map<K, V>
{
Map<K, V> _map = Maps.newHashMap();
Set<V, V> _rev = Maps.newHashMap();
V put(K k, V v) {
if (_rev.containsKey(v)) {
V prev = _rev.get(v);
return _map.put(k, prev);
} else {
_rev.put(v, v);
return _map.put(k,v);
}
}
It somewhat depends upon how you are creating the String
.
One possible way is to use TreeSet
that uses a Comparator
that can compare existing String
s and the source of your new String
. Use SortedSet.tailSet
and an Iterator
to find an existing String
. Or alternatively NavigableSet.ceiling/floor
or a TreeMap
with a similar setup.
I wrote a weblog entry about another technique to cache immutable objects (in particular Strings), but that is more suitable for smaller objects.
String.intern
has performance problems.
Do not use String.intern (there have been various memory issues related to this through the years). instead, create your own cache, similar to String.intern. basically, you want a Map, where each key maps to itself. then, before caching any string, you "intern" it:
private Map<String,WeakReference<String>> myInternMap = new WeakHashMap<String,,WeakReference<String>>();
public String intern(String value) {
synchronized(myInternMap) {
WeakReference<String> curRef = myInternMap.get(value);
String curValue = ((curRef != null) ? curRef.get() : null);
if(curValue != null) {
return curValue;
}
myInternMap.put(value, new WeakReference<String>(value));
return value;
}
}
note, you use weakreferences for the keys and values so that you don't keep references for strings which you are no longer using.
String.intern() will help you here (most likely). It will resolve multiple instances of the same string down to one copy.
EDIT: I suggested this would 'most likely' help. In what scenarios will it not ? Interning strings will have the effect of storing those interned string representations permanently. If the problem domain is a one-shot process, this may not be an issue. If it's a long running process (such as a web app) then you may well have a problem.
I would hesitate to say never use interning (I would hesistate to say never do anything). However there are scenarios where it's not ideal.
Do you actually need Strings, or do you just need any old CharSequence? If not, then consider implementing a "compact" CharSequence such as the one I suggest in the link.
Agreed with others on not using String.intern(): once you've put a string there, it will never go away. Look to early revisions of Xerces for why this is a bad idea.
A better solution is to use a WeakHashMap, wrapping the value in a WeakReference:
private Map<String,WeakReference<String>> _map
= new WeakHashMap<String,WeakReference<String>>();
public synchronized String intern(String str)
{
WeakReference<String> ref = _map.get(str);
String s2 = (ref != null) ? ref.get() : null;
if (s2 != null)
return s2;
str = new String(str);
_map.put(str, new WeakReference(str));
return str;
}
This code is from an article that I wrote on the Java reference objects. You'll find the explanation there.
EDIT: need to create a new string here (and I'll update the article) because the original might be a substring of a far larger character array. I thought that was fixed around JDK 1.3, but apparently not (at least not in 1.5).