Why is this inner loop 4X faster the first iteration through the outer loop?

后端 未结 1 673
孤街浪徒
孤街浪徒 2021-02-05 08:06

I was trying to reproduce some of the processor cache effects described in here. I understand that Java is a managed environment, and these examples will not translate exactly,

1条回答
  •  天涯浪人
    2021-02-05 08:14

    This is a suboptimal recompilation of a method.

    JIT compiler relies on a run-time statistics gathered during interpretation. When main method is compiled for the first time, the outer loop has not yet completed its first iteration => the run-time statistics tells that the code after the inner loop is never executed, so JIT does not ever bother compiling it. It rather generates an uncommon trap.

    When the inner loop ends for the first time, the uncommon trap is hit causing the method to be deoptimized.

    On the second iteration of the outer loop the main method is recompiled with the new knowledge. Now JIT has more statistics and more context to compile. For some reason now it does not cache the value a[0] in the register (probably because JIT is fooled by the wider context). So it generates addl instruction to update the array in memory, that is effectively a combination of memory load and store.

    On the contrary, during the first compilation JIT caches the value of a[0] in the register, there is only mov instruction to store a value in memory (without load).

    Fast loop (first iteration):

    0x00000000029fc562: mov    %ecx,0x10(%r14)   <<< array store
    0x00000000029fc566: mov    %r11d,%edi
    0x00000000029fc569: mov    %r9d,%ecx
    0x00000000029fc56c: add    %edi,%ecx
    0x00000000029fc56e: mov    %ecx,%r11d
    0x00000000029fc571: add    $0x10,%r11d       <<< increment in register
    0x00000000029fc575: mov    %r11d,0x10(%r14)  <<< array store
    0x00000000029fc579: add    $0x11,%ecx
    0x00000000029fc57c: mov    %edi,%r11d
    0x00000000029fc57f: add    $0x10,%r11d
    0x00000000029fc583: cmp    $0x3ffffff2,%r11d
    0x00000000029fc58a: jl     0x00000000029fc562
    

    Slow loop (after recompilation):

    0x00000000029fa1b0: addl   $0x10,0x10(%r14)  <<< increment in memory
    0x00000000029fa1b5: add    $0x10,%r13d
    0x00000000029fa1b9: cmp    $0x3ffffff1,%r13d
    0x00000000029fa1c0: jl     0x00000000029fa1b0
    

    However this problem seems to be fixed in JDK 9. I've checked this test against a recent JDK 9 Early Access release and verified that it works as expected:

    Time for loop#  0:   104 ms
    Time for loop#  1:   101 ms
    Time for loop#  2:    91 ms
    Time for loop#  3:    63 ms
    Time for loop#  4:    60 ms
    Time for loop#  5:    60 ms
    Time for loop#  6:    59 ms
    Time for loop#  7:    55 ms
    Time for loop#  8:    57 ms
    Time for loop#  9:    59 ms
    

    0 讨论(0)
提交回复
热议问题