The code below runs the exact same calculation 3 times (it does not do much: basically adding all numbers from 1 to 100m). The first 2 blocks run approximately 10 times faster t
Short: The Just In Time Compiler is dumb.
First of all you can use the option -XX:+PrintCompilation
to see WHEN the JIT is doing something. Then you will see something like this:
$ java -XX:+PrintCompilation weird
168 1 weird$CountByOne::getNext (28 bytes)
174 1 % weird::main @ 18 (220 bytes)
279 1 % weird::main @ -2 (220 bytes) made not entrant
113727636
280 2 % weird::main @ 91 (220 bytes)
106265475
427228826
So you see that the method main is compiled sometimes during the first and the second block.
Adding the options -XX:+PrintCompilation -XX:+UnlockDiagnosticVMOption
will give you more information about what the JIT is doing. Note, it requires hsdis-amd64.so
which seems to be not very well available on common Linux distributions. You might have tom compile it on your own from the OpenJDK.
What you get is a huge chunk of assembler code for getNext and main.
For me, in the first compilation it seems that only the first block in main is actually compiled, you can tell by the line numbers. It contains funny things like this:
0x00007fa35505fc5b: add $0x1,%r8 ;*ladd
; - weird$CountByOne::getNext@6 (line 12)
; - weird::main@28 (line 31)
0x00007fa35505fc5f: mov %r8,0x10(%rbx) ;*putfield i
; - weird$CountByOne::getNext@7 (line 12)
; - weird::main@28 (line 31)
0x00007fa35505fc63: add $0x1,%r14 ;*ladd
; - weird::main@31 (line 31)
(Indeed it is very long, due to unrolling and inlining of the loop)
Appearently during the recompile of main, the second AND third block is compiled. The second block there looks very similar to the first version. (Again just an excerpt)
0x00007fa35505f05d: add $0x1,%r8 ;*ladd
; - weird$CountByOne::getNext@6 (line 12)
; - weird::main@101 (line 42)
0x00007fa35505f061: mov %r8,0x10(%rbx) ;*putfield i
; - weird$CountByOne::getNext@7 (line 12)
; - weird::main@101 (line 42)
0x00007fa35505f065: add $0x1,%r13 ;*ladd
HOWEVER the third block is compiled differently. Without inlining and unrolling
This time the entire loop looks like this:
0x00007fa35505f20c: xor %r10d,%r10d
0x00007fa35505f20f: xor %r8d,%r8d ;*lload
; - weird::main@171 (line 53)
0x00007fa35505f212: mov %r8d,0x10(%rsp)
0x00007fa35505f217: mov %r10,0x8(%rsp)
0x00007fa35505f21c: mov %rbp,%rsi
0x00007fa35505f21f: callq 0x00007fa355037c60 ; OopMap{rbp=Oop off=580}
;*invokevirtual getNext
; - weird::main@174 (line 53)
; {optimized virtual_call}
0x00007fa35505f224: mov 0x8(%rsp),%r10
0x00007fa35505f229: add %rax,%r10 ;*ladd
; - weird::main@177 (line 53)
0x00007fa35505f22c: mov 0x10(%rsp),%r8d
0x00007fa35505f231: inc %r8d ;*iinc
; - weird::main@180 (line 52)
0x00007fa35505f234: cmp $0x5f5e100,%r8d
0x00007fa35505f23b: jl 0x00007fa35505f212 ;*if_icmpge
; - weird::main@168 (line 52)
My guess is that the JIT identified that this part of the code is not used a lot, since it was using profiling information from the second block execution, and therefore did not optimize it heavily. Also the JIT appears to be lazy in a sense not to recompile one method after all relevant parts have been compiled. Remember the first compilation result did not contain source code for the second/third block AT all, so the JIT had to recompile that.