Performance Explanation: code runs slower after warm up

后端 未结 2 1386
误落风尘
误落风尘 2021-02-01 20:44

The code below runs the exact same calculation 3 times (it does not do much: basically adding all numbers from 1 to 100m). The first 2 blocks run approximately 10 times faster t

相关标签:
2条回答
  • 2021-02-01 21:06

    You need to place each loop in a different method. The reason you need to do this is that the JIT collections statistics on how the code was run and this is used to optimise the code. However, a method is optimised after it is called 10000 time or a loop is run 10000 times.

    In your case, the first loop trigger the whole method to be optimised, even though the later loops have not been run so no statistics have been collected. Place each loop in its own method and this won't happen.

    0 讨论(0)
  • 2021-02-01 21:08

    Short: The Just In Time Compiler is dumb.

    First of all you can use the option -XX:+PrintCompilation to see WHEN the JIT is doing something. Then you will see something like this:

    $ java -XX:+PrintCompilation weird
        168    1             weird$CountByOne::getNext (28 bytes)
        174    1 %           weird::main @ 18 (220 bytes)
        279    1 %           weird::main @ -2 (220 bytes)   made not entrant
    113727636
        280    2 %           weird::main @ 91 (220 bytes)
    106265475
    427228826
    

    So you see that the method main is compiled sometimes during the first and the second block.

    Adding the options -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOption will give you more information about what the JIT is doing. Note, it requires hsdis-amd64.so which seems to be not very well available on common Linux distributions. You might have tom compile it on your own from the OpenJDK.

    What you get is a huge chunk of assembler code for getNext and main.

    For me, in the first compilation it seems that only the first block in main is actually compiled, you can tell by the line numbers. It contains funny things like this:

      0x00007fa35505fc5b: add    $0x1,%r8           ;*ladd
                                                    ; - weird$CountByOne::getNext@6 (line 12)
                                                    ; - weird::main@28 (line 31)
      0x00007fa35505fc5f: mov    %r8,0x10(%rbx)     ;*putfield i
                                                    ; - weird$CountByOne::getNext@7 (line 12)
                                                    ; - weird::main@28 (line 31)
      0x00007fa35505fc63: add    $0x1,%r14          ;*ladd
                                                    ; - weird::main@31 (line 31)
    

    (Indeed it is very long, due to unrolling and inlining of the loop)

    Appearently during the recompile of main, the second AND third block is compiled. The second block there looks very similar to the first version. (Again just an excerpt)

     0x00007fa35505f05d: add    $0x1,%r8           ;*ladd
                                                    ; - weird$CountByOne::getNext@6 (line 12)
                                                    ; - weird::main@101 (line 42)
      0x00007fa35505f061: mov    %r8,0x10(%rbx)     ;*putfield i
                                                    ; - weird$CountByOne::getNext@7 (line 12)
                                                    ; - weird::main@101 (line 42)
      0x00007fa35505f065: add    $0x1,%r13          ;*ladd
    

    HOWEVER the third block is compiled differently. Without inlining and unrolling

    This time the entire loop looks like this:

      0x00007fa35505f20c: xor    %r10d,%r10d
      0x00007fa35505f20f: xor    %r8d,%r8d          ;*lload
                                                    ; - weird::main@171 (line 53)
      0x00007fa35505f212: mov    %r8d,0x10(%rsp)
      0x00007fa35505f217: mov    %r10,0x8(%rsp)
      0x00007fa35505f21c: mov    %rbp,%rsi
      0x00007fa35505f21f: callq  0x00007fa355037c60  ; OopMap{rbp=Oop off=580}
                                                    ;*invokevirtual getNext
                                                    ; - weird::main@174 (line 53)
                                                    ;   {optimized virtual_call}
      0x00007fa35505f224: mov    0x8(%rsp),%r10
      0x00007fa35505f229: add    %rax,%r10          ;*ladd
                                                    ; - weird::main@177 (line 53)
      0x00007fa35505f22c: mov    0x10(%rsp),%r8d
      0x00007fa35505f231: inc    %r8d               ;*iinc
                                                    ; - weird::main@180 (line 52)
      0x00007fa35505f234: cmp    $0x5f5e100,%r8d
      0x00007fa35505f23b: jl     0x00007fa35505f212  ;*if_icmpge
                                                    ; - weird::main@168 (line 52)
    

    My guess is that the JIT identified that this part of the code is not used a lot, since it was using profiling information from the second block execution, and therefore did not optimize it heavily. Also the JIT appears to be lazy in a sense not to recompile one method after all relevant parts have been compiled. Remember the first compilation result did not contain source code for the second/third block AT all, so the JIT had to recompile that.

    0 讨论(0)
提交回复
热议问题