Why is if (variable1 % variable2 == 0) inefficient?

前端 未结 4 1127
我寻月下人不归
我寻月下人不归 2021-01-29 18:51

I am new to java, and was running some code last night, and this really bothered me. I was building a simple program to display every X outputs in a for loop, and I noticed a MA

4条回答
  •  温柔的废话
    2021-01-29 19:23

    You are measuring the OSR (on-stack replacement) stub.

    OSR stub is a special version of compiled method intended specifically for transferring execution from interpreted mode to compiled code while the method is running.

    OSR stubs are not as optimized as regular methods, because they need a frame layout compatible with interpreted frame. I showed this already in the following answers: 1, 2, 3.

    A similar thing happens here, too. While "inefficient code" is running a long loop, the method is compiled specially for the on-stack replacement right inside the loop. The state is transferred from the interpreted frame to OSR-compiled method, and this state includes progressCheck local variable. At this point JIT cannot replace the variable with the constant, and thus cannot apply certain optimizations like strength reduction.

    In particular this means JIT does not replace integer division with multiplication. (See Why does GCC use multiplication by a strange number in implementing integer division? for the asm trick from an ahead-of-time compiler, when the value is a compile-time constant after inlining / constant-propagation, if those optimizations are enabled. An integer literal right in the % expression also gets optimized by gcc -O0, similar to here where it's optimized by the JITer even in an OSR stub.)

    However, if you run the same method several times, the second and the subsequent runs will execute the regular (non-OSR) code, which is fully optimized. Here is a benchmark to prove the theory (benchmarked using JMH):

    @State(Scope.Benchmark)
    public class Div {
    
        @Benchmark
        public void divConst(Blackhole blackhole) {
            long startNum = 0;
            long stopNum = 100000000L;
    
            for (long i = startNum; i <= stopNum; i++) {
                if (i % 50000 == 0) {
                    blackhole.consume(i);
                }
            }
        }
    
        @Benchmark
        public void divVar(Blackhole blackhole) {
            long startNum = 0;
            long stopNum = 100000000L;
            long progressCheck = 50000;
    
            for (long i = startNum; i <= stopNum; i++) {
                if (i % progressCheck == 0) {
                    blackhole.consume(i);
                }
            }
        }
    }
    

    And the results:

    # Benchmark: bench.Div.divConst
    
    # Run progress: 0,00% complete, ETA 00:00:16
    # Fork: 1 of 1
    # Warmup Iteration   1: 126,967 ms/op
    # Warmup Iteration   2: 105,660 ms/op
    # Warmup Iteration   3: 106,205 ms/op
    Iteration   1: 105,620 ms/op
    Iteration   2: 105,789 ms/op
    Iteration   3: 105,915 ms/op
    Iteration   4: 105,629 ms/op
    Iteration   5: 105,632 ms/op
    
    
    # Benchmark: bench.Div.divVar
    
    # Run progress: 50,00% complete, ETA 00:00:09
    # Fork: 1 of 1
    # Warmup Iteration   1: 844,708 ms/op          <-- much slower!
    # Warmup Iteration   2: 105,893 ms/op          <-- as fast as divConst
    # Warmup Iteration   3: 105,601 ms/op
    Iteration   1: 105,570 ms/op
    Iteration   2: 105,475 ms/op
    Iteration   3: 105,702 ms/op
    Iteration   4: 105,535 ms/op
    Iteration   5: 105,766 ms/op
    

    The very first iteration of divVar is indeed much slower, because of inefficiently compiled OSR stub. But as soon as the method reruns from the beginning, the new unconstrained version is executed which leverages all the available compiler optimizations.

提交回复
热议问题