Performance penalty of String.intern()

后端 未结 5 1371
猫巷女王i
猫巷女王i 2020-11-28 04:56

Lots of people talk about the performance advantages of String.intern(), but I\'m actually more interested in what the performance penalty may be.

My main concerns a

相关标签:
5条回答
  • 2020-11-28 05:23

    I did a little bit of benchmarking myself. For the search cost part, I've decided to compare String.intern() with ConcurrentHashMap.putIfAbsent(s,s). Basically, those two methods do the same things, except String.intern() is a native method that stores and read from a SymbolTable that is managed directly in the JVM, and ConcurrentHashMap.putIfAbsent() is just a normal instance method.

    You can find the benchmark code on github gist (for a lack of a better place to put it). You can also find the options I used when launching the JVM (to verify that the benchmark is not skewed) in the comments at the top of the source file.

    Anyway here are the results:

    Search cost (single threaded)

    Legend

    • count: the number of distinct strings that we are trying to pool
    • initial intern: the time in ms it took to insert all the strings in the string pool
    • lookup same string: the time in ms it took to lookup each of the strings again from the pool, using exactly the same instance as was previously entered in the pool
    • lookup equal string: the time in ms it took to lookup each of the strings again from the pool, but using a different instance

    String.intern()

    count       initial intern   lookup same string  lookup equal string
    1'000'000            40206                34698                35000
      400'000             5198                 4481                 4477
      200'000              955                  828                  803
      100'000              234                  215                  220
       80'000              110                   94                   99
       40'000               52                   30                   32
       20'000               20                   10                   13
       10'000                7                    5                    7
    

    ConcurrentHashMap.putIfAbsent()

    count       initial intern   lookup same string  lookup equal string
    1'000'000              411                  246                  309
      800'000              352                  194                  229
      400'000              162                   95                  114
      200'000               78                   50                   55
      100'000               41                   28                   28
       80'000               31                   23                   22
       40'000               20                   14                   16
       20'000               12                    6                    7
       10'000                9                    5                    3
    

    The conclusion for the search cost: String.intern() is surprisingly expensive to call. It scales extremely badly, in something of O(n) where n is the number of strings in the pool. When the number of strings in the pool grows, the amount of time to lookup one string from the pool grows much more (0.7 microsecond per lookup with 10'000 strings, 40 microseconds per lookup with 1'000'000 strings).

    ConcurrentHashMap scales as expected, the number of strings in the pool has no impact on the speed of the lookup.

    Based on this experiment, I'd strongly suggest avoiding to use String.intern() if you are going to intern more than a few strings.

    0 讨论(0)
  • 2020-11-28 05:41

    String.intern become slow is becuase two reasons:
    1. the -XX:StringTableSize limitation.
    In java,it uses a internal hashtable to manage string cache,in java 6,the default StringTableSize value is 1009,which means string.intern is O(the number of string object/ 1009),when more and more string object been created,it's becoming slower.

    \openjdk7\hotspot\src\share\vm\classfile\symbolTable.cpp

    oop StringTable::intern(Handle string_or_null, jchar* name,  
                            int len, TRAPS) {  
      unsigned int hashValue = java_lang_String::hash_string(name, len);  
      int index = the_table()->hash_to_index(hashValue);  
      oop string = the_table()->lookup(index, name, len, hashValue);  
      // Found  
      if (string != NULL) return string;  
      // Otherwise, add to symbol to table  
      return the_table()->basic_add(index, string_or_null, name, len,  
                                    hashValue, CHECK_NULL);  
    }
    

    2. In java 6,the string cache pool is in the perm area,not in the heap,Most of the time,we config the perm size relatively small.

    0 讨论(0)
  • 2020-11-28 05:46

    I've found it better to use a fastutil hash table and do my own interning rather than reuse String.intern(). Using my own hashtable means that I can make my own decisions about concurrency, and I'm not competing for PermGen space.

    I did this because I was working on a problem that had, as it were, millions of strings, many identical, and I wanted to (a) reduce footprint and (b) allow comparison by identity. For my problem, things were better with interning than without, using my notString.intern() approach.

    YMMV.

    0 讨论(0)
  • 2020-11-28 05:46

    The following micro benchmark suggests using an enum offers around a ten times performance improvement (the usual micro benchmark caveats apply) test code as follows:

    public class Test {
       private enum E {
          E1;
          private static final Map<String, E> named = new HashMap<String, E>();
          static {
             for (E e : E.values()) {
                named.put( e.name(), e );
             }
          }
    
          private static E get(String s) {
             return named.get( s );
          }
       }
    
       public static void main(String... strings) {
          E e = E.get( "E1" ); // ensure map is initialised
    
          long start = System.nanoTime();
          testMap( 10000000 );
          long end = System.nanoTime();
    
          System.out.println( 1E-9 * (end - start) );
       }
    
       private static void testIntern(int num) {
          for (int i = 0; i < num; i++) {
             String s = "E1".intern();
          }
       }
    
       private static void testMap(int num) {
          for (int i = 0; i < num; i++) {
             E e = E.get( "E1" );
          }
       }
    }
    

    Results (10 million iterations): testIntern() - 0.8 seconds testMap() - 0.06 seconds

    Of course YMMV, but enums offer so many benefits over Strings...type-safety over other random Strings, ability to add methods etc. seems the best way to go imho

    0 讨论(0)
  • 2020-11-28 05:47

    I have recently written an article about String.intern() implementation in Java 6, 7 and 8: String.intern in Java 6, 7 and 8 - string pooling.

    There is a -XX:StringTableSize JVM parameter, which will allow you to make String.intern extremely useful in Java7+. So, unfortunately I have to say that this question is currently giving the misleading information to the readers.

    0 讨论(0)
提交回复
热议问题