Performance Explanation: code runs slower after warm up

后端未结

关注

 2  1387

误落风尘 2021-02-01 20:44

The code below runs the exact same calculation 3 times (it does not do much: basically adding all numbers from 1 to 100m). The first 2 blocks run approximately 10 times faster t

2条回答

小鲜肉 (楼主)

2021-02-01 21:08

Short: The Just In Time Compiler is dumb.

First of all you can use the option -XX:+PrintCompilation to see WHEN the JIT is doing something. Then you will see something like this:

$ java -XX:+PrintCompilation weird
    168    1             weird$CountByOne::getNext (28 bytes)
    174    1 %           weird::main @ 18 (220 bytes)
    279    1 %           weird::main @ -2 (220 bytes)   made not entrant
113727636
    280    2 %           weird::main @ 91 (220 bytes)
106265475
427228826

So you see that the method main is compiled sometimes during the first and the second block.

Adding the options -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOption will give you more information about what the JIT is doing. Note, it requires hsdis-amd64.so which seems to be not very well available on common Linux distributions. You might have tom compile it on your own from the OpenJDK.

What you get is a huge chunk of assembler code for getNext and main.

For me, in the first compilation it seems that only the first block in main is actually compiled, you can tell by the line numbers. It contains funny things like this:

  0x00007fa35505fc5b: add    $0x1,%r8           ;*ladd
                                                ; - weird$CountByOne::getNext@6 (line 12)
                                                ; - weird::main@28 (line 31)
  0x00007fa35505fc5f: mov    %r8,0x10(%rbx)     ;*putfield i
                                                ; - weird$CountByOne::getNext@7 (line 12)
                                                ; - weird::main@28 (line 31)
  0x00007fa35505fc63: add    $0x1,%r14          ;*ladd
                                                ; - weird::main@31 (line 31)

(Indeed it is very long, due to unrolling and inlining of the loop)

Appearently during the recompile of main, the second AND third block is compiled. The second block there looks very similar to the first version. (Again just an excerpt)

 0x00007fa35505f05d: add    $0x1,%r8           ;*ladd
                                                ; - weird$CountByOne::getNext@6 (line 12)
                                                ; - weird::main@101 (line 42)
  0x00007fa35505f061: mov    %r8,0x10(%rbx)     ;*putfield i
                                                ; - weird$CountByOne::getNext@7 (line 12)
                                                ; - weird::main@101 (line 42)
  0x00007fa35505f065: add    $0x1,%r13          ;*ladd

HOWEVER the third block is compiled differently. Without inlining and unrolling

This time the entire loop looks like this:

  0x00007fa35505f20c: xor    %r10d,%r10d
  0x00007fa35505f20f: xor    %r8d,%r8d          ;*lload
                                                ; - weird::main@171 (line 53)
  0x00007fa35505f212: mov    %r8d,0x10(%rsp)
  0x00007fa35505f217: mov    %r10,0x8(%rsp)
  0x00007fa35505f21c: mov    %rbp,%rsi
  0x00007fa35505f21f: callq  0x00007fa355037c60  ; OopMap{rbp=Oop off=580}
                                                ;*invokevirtual getNext
                                                ; - weird::main@174 (line 53)
                                                ;   {optimized virtual_call}
  0x00007fa35505f224: mov    0x8(%rsp),%r10
  0x00007fa35505f229: add    %rax,%r10          ;*ladd
                                                ; - weird::main@177 (line 53)
  0x00007fa35505f22c: mov    0x10(%rsp),%r8d
  0x00007fa35505f231: inc    %r8d               ;*iinc
                                                ; - weird::main@180 (line 52)
  0x00007fa35505f234: cmp    $0x5f5e100,%r8d
  0x00007fa35505f23b: jl     0x00007fa35505f212  ;*if_icmpge
                                                ; - weird::main@168 (line 52)

My guess is that the JIT identified that this part of the code is not used a lot, since it was using profiling information from the second block execution, and therefore did not optimize it heavily. Also the JIT appears to be lazy in a sense not to recompile one method after all relevant parts have been compiled. Remember the first compilation result did not contain source code for the second/third block AT all, so the JIT had to recompile that.

0 讨论(0)

查看其它2个回答