Consuming stack traces noticeably slower in Java 11 than Java 8

旧城冷巷雨未停 提交于 2019-12-02 16:35:30

I investigated the issue with async-profiler which can draw cool flame graphs demonstrating where the CPU time is spent.

As @AlekseyShipilev pointed out, the slowdown between JDK 8 and JDK 9 is mainly the result of StackWalker changes. Also G1 has become the default GC since JDK 9. If we explicitly set -XX:+UseParallelGC (default in JDK 8), the scores will be slightly better.

But the most interesting part is the slowdown in JDK 11.
Here is what async-profiler shows (clickable SVG).

The main difference between two profiles is in the size of java_lang_Throwable::get_stack_trace_elements block, which is dominated by StringTable::intern. Apparently StringTable::intern takes much longer on JDK 11.

Let's zoom in:

Note that StringTable::intern in JDK 11 calls do_intern which in turn allocates a new java.lang.String object. Looks suspicious. Nothing of this kind is seen in JDK 10 profile. Time to look in the source code.

stringTable.cpp (JDK 11)

oop StringTable::intern(Handle string_or_null_h, jchar* name, int len, TRAPS) {
  // shared table always uses java_lang_String::hash_code
  unsigned int hash = java_lang_String::hash_code(name, len);
  oop found_string = StringTable::the_table()->lookup_shared(name, len, hash);
  if (found_string != NULL) {
    return found_string;
  }
  if (StringTable::_alt_hash) {
    hash = hash_string(name, len, true);
  }
  return StringTable::the_table()->do_intern(string_or_null_h, name, len,
                                       |     hash, CHECK_NULL);
}                                      |
                       ----------------
                      |
                      v
oop StringTable::do_intern(Handle string_or_null_h, const jchar* name,
                           int len, uintx hash, TRAPS) {
  HandleMark hm(THREAD);  // cleanup strings created
  Handle string_h;

  if (!string_or_null_h.is_null()) {
    string_h = string_or_null_h;
  } else {
    string_h = java_lang_String::create_from_unicode(name, len, CHECK_NULL);
  }

The function in JDK 11 first looks for a string in the shared StringTable, does not find it, then goes to do_intern and immediately creates a new String object.

In JDK 10 sources after a call to lookup_shared there was an additional lookup in the main table which returned the existing string without creation of a new object:

  found_string = the_table()->lookup_in_main_table(index, name, len, hashValue);

This refactoring was a result of JDK-8195097 "Make it possible to process StringTable outside safepoint".

TL;DR While interning method names in JDK 11, HotSpot creates redundant String objects. This has happened after JDK-8195097.

I suspect this is due to several changes.

8->9 regression happened while switching to StackWalker for generating the stack traces (JDK-8150778). Unfortunately, this made VM native code intern a lot of strings, and StringTable becomes the bottleneck. If you profile OP's benchmark, you will see the profile like in JDK-8151751. It should be enough to perf record -g the entire JVM that runs the benchmark, and then look into perf report. (Hint, hint, you can do it yourself next time!)

And 10->11 regression must have happened later. I suspect this is due to StringTable preparations for switching to fully concurrent hash table (JDK-8195100, which, as Claes points out, is not entirely in 11) or something else (class data sharing changes?).

Either way, interning on fast path is a bad idea, and patch for JDK-8151751 should have dealt with both regressions.

Watch this:

8u191: 15108 ± 99 ns/op [so far so good]

-   54.55%     0.37%  java     libjvm.so           [.] JVM_GetStackTraceElement 
   - 54.18% JVM_GetStackTraceElement                          
      - 52.22% java_lang_Throwable::get_stack_trace_element   
         - 48.23% java_lang_StackTraceElement::create         
            - 17.82% StringTable::intern                      
            - 13.92% StringTable::intern                      
            - 4.83% Klass::external_name                      
            + 3.41% Method::line_number_from_bci              

"head": 22382 ± 134 ns/op [regression]

-   69.79%     0.05%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
   - 69.73% JVM_InitStackTraceElementArray                    
      - 69.14% java_lang_Throwable::get_stack_trace_elements  
         - 66.86% java_lang_StackTraceElement::fill_in        
            - 38.48% StringTable::intern                      
            - 21.81% StringTable::intern                      
            - 2.21% Klass::external_name                      
              1.82% Method::line_number_from_bci              
              0.97% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<573

"head" + JDK-8151751 patch: 7511 ± 26 ns/op [woot, even better than 8u]

-   22.53%     0.12%  org.sample.MyBe  libjvm.so  [.] JVM_InitStackTraceElement
   - 22.40% JVM_InitStackTraceElementArray                    
      - 20.25% java_lang_Throwable::get_stack_trace_elements  
         - 12.69% java_lang_StackTraceElement::fill_in        
            + 6.86% Method::line_number_from_bci              
              2.08% AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier
           2.24% InstanceKlass::method_with_orig_idnum        
           1.03% Handle::Handle        
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!