Consuming stack traces noticeably slower in Java 11 than Java 8

前端 未结 2 2003
攒了一身酷
攒了一身酷 2020-12-24 06:33

I was comparing the performance of JDK 8 and 11 using jmh 1.21 when I ran across some surprising numbers:

Java versi         


        
相关标签:
2条回答
  • 2020-12-24 07:09

    I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.

    As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.

    But the most interesting part is the slowdown in JDK 11.
    Here is what async-profiler shows (clickable SVG).

    The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.

    Let's zoom in:

    Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.

    stringTable.cpp (JDK 11)

    oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
      // shared table always uses java_lang_String::hash_code
      unsigned int hash = java_lang_String::hash_code(name, len);
      oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
      if (found_string != NULL) {
        return found_string;
      }
      if (StringTable::_alt_hash) {
        hash = hash_string(name, len, true);
      }
      return StringTable::the_table()->do_intern(string_or_null_h, name, len,
                                           |     hash, CHECK_NULL);
    }                                      |
                           ----------------
                          |
                          v
    oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
                               int len, uintx hash, TRAPS) {
      HandleMark hm(THREAD);  // cleanup strings created
      Handle string_h;
    
      if (!string_or_null_h.is_null()) {
        string_h = string_or_null_h;
      } else {
        string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
      }
    

    The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.

    In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:

      found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);
    

    This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".

    TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.

    0 讨论(0)
  • 2020-12-24 07:17

    I suspect this is due to several changes.

    8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)

    And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).

    Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.

    Watch this:

    8u191: 15108 ± 99 ns/op [so far so good]

    -   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
       - 54.18% JVM_GetStackTraceElement                          
          - 52.22% java_lang_Throwable::get_stack_trace_element   
             - 48.23% java_lang_StackTraceElement::create         
                - 17.82% StringTable::intern                      
                - 13.92% StringTable::intern                      
                - 4.83% Klass::external_name                      
                + 3.41% Method::line_number_from_bci              
    

    "head": 22382 ± 134 ns/op [regression]

    -   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
       - 69.73% JVM_InitStackTraceElementArray                    
          - 69.14% java_lang_Throwable::get_stack_trace_elements  
             - 66.86% java_lang_StackTraceElement::fill_in        
                - 38.48% StringTable::intern                      
                - 21.81% StringTable::intern                      
                - 2.21% Klass::external_name                      
                  1.82% Method::line_number_from_bci              
                  0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573
    

    "head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]

    -   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
       - 22.40% JVM_InitStackTraceElementArray                    
          - 20.25% java_lang_Throwable::get_stack_trace_elements  
             - 12.69% java_lang_StackTraceElement::fill_in        
                + 6.86% Method::line_number_from_bci              
                  2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
               2.24% InstanceKlass::method_with_orig_idnum        
               1.03% Handle::Handle        
    
    0 讨论(0)
提交回复
热议问题