I want to ask a question about avoiding String duplicates in Java.
The context is: an XML with tags and attributes like this one:
While String.intern() could solve that problem by reducing each value to a single unique String
instance, it would introduce another problem: every intern()
-ed String
can survive for a long time in the JVM. If the IDs vary a lot (i.e. they are not part of a limited set, but can be any value), then this can have massive negative effects in the long run.
Edit: I used to claim that intern()
-ed Strings can't ever be GCed, but @nanda proved me wrong with this JavaWorld article. While this somewhat reduces the problem introduced by intern()
it's still not entirely removed: the pool provided by intern()
can't be controlled and can have unexpected results with regards to garbage-collection).
Luckily Guava provides a solution in the form of the Interner interface and it's helper class Interners: Using Interners.newStrongInterner() you can create an object that can act as a "pool" of unique String
objects much in the same way as String.intern()
does, except that the pool is bound to that instance and if you discard the pool, then the content can become eligible for garbage collection as well.