Java program runs slower when code that is never executed is commented out

前端 未结 3 1768
不思量自难忘°
不思量自难忘° 2021-01-30 13:00

I observed some strange behaviour in one of my Java programs. I have tried to strip the code down as much as possible while still being able to replicate the behaviour. Code in

相关标签:
3条回答
  • 2021-01-30 13:05

    I am with @k5_, it seems that there exists a threshold to determine whether to inline a function. And if JIT compiler decided to inline it, it will cause a lot of work and time to do it as -XX:+PrintCompilation shows:

      task-id
        158   32       3       so_test.StrangeBehaviour::functionB (326 bytes)   made not entrant
        159   35       3       java.lang.String::<init> (82 bytes)
        160   36  s    1       java.util.Vector::size (5 bytes)
        1878   37 %     3       so_test.StrangeBehaviour::main @ 6 (65 bytes)
        1898   38       3       so_test.StrangeBehaviour::main (65 bytes)
        2665   39       3       java.util.regex.Pattern::has (15 bytes)
        2667   40       3       sun.misc.FDBigInteger::mult (64 bytes)
        2668   41       3       sun.misc.FDBigInteger::<init> (30 bytes)
        2668   42       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
        2.51 seconds elapsed.
    

    The upper is info without comment, the following is with a comment which reduce the method size from 326 byte to 318 byte. And you can notice the task id in the column 1 of output, is so much larger in latter, which causes more time.

      task-id
        126   35       4       so_test.StrangeBehaviour::functionA (12 bytes)
        130   33       3       so_test.StrangeBehaviour::functionA (12 bytes)   made not entrant
        131   36  s    1       java.util.Vector::size (5 bytes)
        14078   37 %     3       so_test.StrangeBehaviour::main @ 6 (65 bytes)
        14296   38       3       so_test.StrangeBehaviour::main (65 bytes)
        14296   39 %     4       so_test.StrangeBehaviour::functionB @ 2 (318 bytes)
        14300   40       4       so_test.StrangeBehaviour::functionB (318 bytes)
        14304   34       3       so_test.StrangeBehaviour::functionB (318 bytes)   made not entrant
        14628   41       3       java.util.regex.Pattern::has (15 bytes)
        14631   42       3       sun.misc.FDBigInteger::mult (64 bytes)
        14632   43       3       sun.misc.FDBigInteger::<init> (30 bytes)
        14632   44       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
        14.50 seconds elapsed.
    

    And if you change the code into following (add two line and commnet out a print line), you can see that the code size change into 326 byte, and runs faster now:

            if (StrangeBehaviour.recursionFlag) {
                int a = 1;
                int b = 1;
                if (recursionSwitch == 0) {
                    if (functionA(recursionDepth - 1, 1 - recursionSwitch)) return true;
                } else {
                    if (!functionA(recursionDepth - 1, 1 - recursionSwitch)) return false;
                }
            } else {
                // This block is never entered into.
                // Yet commenting out one of the lines below makes the program run slower!
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
              //System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
                System.out.println("...");
            }
    

    New time and JIT compiler info:

        140   34       3       so_test.StrangeBehaviour::functionB (326 bytes)   made not entrant
        145   36       3       java.lang.String::<init> (82 bytes)
        148   37  s    1       java.util.Vector::size (5 bytes)
        162   38       4       so_test.StrangeBehaviour::functionA (12 bytes)
        163   33       3       so_test.StrangeBehaviour::functionA (12 bytes)   made not entrant
        1916   39 %     3       so_test.StrangeBehaviour::main @ 6 (65 bytes)
        1936   40       3       so_test.StrangeBehaviour::main (65 bytes)
        2686   41       3       java.util.regex.Pattern::has (15 bytes)
        2689   42       3       sun.misc.FDBigInteger::mult (64 bytes)
        2690   43       3       sun.misc.FDBigInteger::<init> (30 bytes)
        2690   44       3       sun.misc.FDBigInteger::trimLeadingZeros (57 bytes)
        2.55 seconds elapsed.
    

    In conclusion:

    • When method size exceeds some limits, JIT will not inline this function;
    • And if we comment out a line, which reduce to a size below the threshold, JIT decide to inline it;
    • Inlining that function causes a lot of JIT tasks which slows down the program.

    update:

    Accroding to my latest trial, the answer for this question is not so easy:

    As my code sample shows, a normal inline optimization will

    • accelerates program
    • and not costs much compiler work (in my test, it even cost less work when inline happens).

    But in this problem, the code causes much JIT work and slows down program which seems a bug of JIT. And it is still not clear why it causes so much work of JIT.

    0 讨论(0)
  • 2021-01-30 13:10

    The complete answer is a combination of k5_ and Tony's answers.

    The code that the OP posted omits a warmup loop to trigger HotSpot compilation before doing the benchmark; hence the 10-fold (on my computer) speedup when the print statements are included, combines both the time spent in HotSpot to compile the bytecode to CPU instructions, as well as the actual running of the CPU instructions.

    If I add a separate warmup loop before the timing loop, there is only a 2.5-fold speedup with the print statement.

    That indicates that both the HotSpot/JIT compilation takes longer when the method is inlined (as Tony explained) as well as that the running of the code takes longer, probably because of worse cache or branch-prediction/pipelining performance, as k5_ showed.

    public static void main(String[] args) {
        // Added the following warmup loop before the timing loop
        for (int i = 0; i < 50000; i++) {
            functionA(6, 0);
        }
    
        long startTime = System.nanoTime();
        for (int i = 0; i < 50000; i++) {
            functionA(6, 0);
        }
        long endTime = System.nanoTime();
        System.out.format("%.2f seconds elapsed.\n", (endTime - startTime) / 1000.0 / 1000 / 1000);
    }
    
    0 讨论(0)
  • 2021-01-30 13:21

    The commented code affects how inlining is handled. If functionB gets longer/bigger (more bytecode instructions) it will not be inlined into functionA.

    So @J3D1 was able to use VMOptions to manually switch off inlining for functionB(): -XX:CompileCommand=dontinline,com.jd.benchmarking.StrangeBeh‌​aviour::functionB This appears to eliminate the delay with the shorter function.

    with the vm options you can display inlining -XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining

    The bigger Version, wont inline functionB

    @ 8   StrangeBehaviour::functionB (326 bytes)   callee is too large
    @ 21   StrangeBehaviour::functionA (12 bytes)
      @ 8   StrangeBehaviour::functionB (326 bytes)   callee is too large
    @ 35   StrangeBehaviour::functionA (12 bytes)
      @ 8   StrangeBehaviour::functionB (326 bytes)   callee is too large
    

    The shorter Version will try to inline functionB, causing a few of further attempts.

    @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
     @ 21   StrangeBehaviour::functionA (12 bytes)   inline (hot)
       @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
         @ 35   StrangeBehaviour::functionA (12 bytes)   recursive inlining is too deep
     @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
       @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
         @ 21   StrangeBehaviour::functionA (12 bytes)   recursive inlining is too deep
         @ 35   StrangeBehaviour::functionA (12 bytes)   recursive inlining is too deep
    @ 21   StrangeBehaviour::functionA (12 bytes)   inline (hot)
     @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
       @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
         @ 8   StrangeBehaviour::functionB (318 bytes)   recursive inlining is too deep
    @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
     @ 8   StrangeBehaviour::functionB (318 bytes)   inline (hot)
       @ 21   StrangeBehaviour::functionA (12 bytes)   inline (hot)
        @ 8   StrangeBehaviour::functionB (318 bytes)   recursive inlining is too deep
       @ 35   StrangeBehaviour::functionA (12 bytes)   inline (hot)
         @ 8   StrangeBehaviour::functionB (318 bytes)   recursive inlining is too deep
    

    Mostly guessing, but the bigger/inlined bytecode will cause problems with branch prediction and caching

    0 讨论(0)
提交回复
热议问题