Strange performance drop of JDK8 LocalDate.toEpochDay

好久不见. 提交于 2019-12-01 18:21:13

I haven't catched the reason directly, but it is certainly a benchmarking framework shortcoming. Something related to GC and per-invocation costs. I have the same performance degradation with JMH, except bench with 100 dates shows better perf than with 2000 dates, too. I've tried to create the dates array always of maximum size, and iterate just first 100, 2000, 30000 elements. In this case all versions performed equally (15.3 +- 0.3 ns on my machine).

import org.openjdk.jmh.annotations.*;

import java.time.LocalDate;
import java.util.*;
import java.util.concurrent.ThreadLocalRandom;
import java.util.concurrent.TimeUnit;


@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OperationsPerInvocation(LocalDateBenchmark.ITERATIONS)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
public class LocalDateBenchmark {
    public static final int MAX_ITERATIONS = 1000000;
    public static final int ITERATIONS = 30000;

    private static final LocalDate MIN_DATE = LocalDate.of(1900, 1, 1);
    private static final LocalDate MAX_DATE = LocalDate.of(2100, 1, 1);
    private static final int DAYS_BETWEEN = (int) (MAX_DATE.toEpochDay() - MIN_DATE.toEpochDay());

    public LocalDate[] dates = new LocalDate[MAX_ITERATIONS];
    private Random random;

    @Setup(Level.Trial)
    public void setUpAll() {
        Random r = ThreadLocalRandom.current();
        for (int i=0; i< dates.length; ++i) {
            dates[i] = MIN_DATE.plusDays(r.nextInt(DAYS_BETWEEN));
        }
    }

    @Setup(Level.Iteration)
    public void setUpRandom() {
        random = new Random();
    }

    @GenerateMicroBenchmark
    public int timeToEpochDay(LocalDateBenchmark state) {
        int result = 0;
        LocalDate[] dates = state.dates;
        int offset = random.nextInt(MAX_ITERATIONS - ITERATIONS);
        for (int i = offset; i < offset + ITERATIONS; i++) {
            LocalDate date = dates[i];
            result += date.toEpochDay();
        }
        return result;
    }
}

thats because there are no divisions in the algorithm. all the / 4 are replaced by shifts. and all the / 100 are actually * 0.01's. the divisions are there for readbility (hehe). im not sure if this optimization happens during bytecode emission or JIT compilation, it would be interesting to look at the class file and find out.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!