CAS vs synchronized performance

前端 未结 4 1265
北恋
北恋 2021-02-08 09:10

I\'ve had this question for quite a while now, trying to read lots of resources and understanding what is going on - but I\'ve still failed to get a good understanding of why th

4条回答
  •  心在旅途
    2021-02-08 09:56

    You should read, re-read, and accept @Holger's excellent answer, as the insights it provides are far more valuable than a single set of benchmark numbers from one developer's workstation.

    I tweaked your benchmarks to make them a bit more apples-to-apples, but if you read @Holger's answer, you'll see why this isn't a terribly useful test. I'm going to include my changes and my results simply to show how results can vary from one machine (or one JRE version) to another.

    First, my version of benchmarks:

    @State(Scope.Benchmark)
    public class SandBox {
        public static void main(String[] args) throws RunnerException {
            new Runner(
                new OptionsBuilder().include(SandBox.class.getSimpleName())
                                    .shouldFailOnError(true)
                                    .mode(Mode.AverageTime)
                                    .timeUnit(TimeUnit.NANOSECONDS)
                                    .warmupIterations(5)
                                    .warmupTime(TimeValue.seconds(5))
                                    .measurementIterations(5)
                                    .measurementTime(TimeValue.seconds(5))
                                    .threads(-1)
                                    .build()
            ).run();
        }
    
        private long number = 0xCAFEBABECAFED00DL;
        private final Object lock = new Object();
        private final AtomicLong atomicNumber = new AtomicLong(number);
    
        @Setup(Level.Iteration)
        public void setUp() {
            number = 0xCAFEBABECAFED00DL;
            atomicNumber.set(number);
        }
    
        @Fork(1)
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public long casShared() {
            return atomicNumber.updateAndGet(x -> x * 123L);
        }
    
        @Fork(1)
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public long syncShared() {
            synchronized (lock) {
                return number *= 123L;
            }
        }
    
        @Fork(value = 1, jvmArgsAppend = "-XX:-UseBiasedLocking")
        @Benchmark
        @CompilerControl(CompilerControl.Mode.DONT_INLINE)
        public long syncSharedNonBiased() {
            synchronized (lock) {
                return number *= 123L;
            }
        }
    }
    

    And then my first batch of results:

    # VM version: JDK 1.8.0_60, VM 25.60-b23
    
    Benchmark                    Mode  Cnt     Score     Error  Units
    SandBox.casShared            avgt    5   976.215 ± 167.865  ns/op
    SandBox.syncShared           avgt    5  1820.554 ±  91.883  ns/op
    SandBox.syncSharedNonBiased  avgt    5  1996.305 ± 124.681  ns/op
    

    Recall that you saw synchronized coming out ahead under high contention. On my workstation, the atomic version fared better. If you use my version of your benchmarks, what results do you see? It won't surprise me in the least if they're substantially different.

    Here's another set run under a months-old Java 9 EA release:

    # VM version: JDK 9-ea, VM 9-ea+170
    
    Benchmark                    Mode  Cnt     Score     Error  Units
    SandBox.casShared            avgt    5   979.615 ± 135.495  ns/op
    SandBox.syncShared           avgt    5  1426.042 ±  52.971  ns/op
    SandBox.syncSharedNonBiased  avgt    5  1649.868 ±  48.410  ns/op
    

    The difference is less dramatic here. It's not terribly unusual to see a difference across major JRE versions, but who's to say you won't see them across minor releases too?

    At the end of the day, the results are close. Very close. The performance of synchronized has come a long way since the early Java versions. If you are not writing HFT algorithms or something else that's incredibly latency sensitive, you should prefer the solution that's most easily proven correct. It is generally easier to reason about synchronized than lock-free algorithms and data structures. If you cannot demonstrate a measurable difference in your application, then synchronized is what you should use.

提交回复
热议问题