What is the synchronization cost of calling a synchronized method from a synchronized method?

后端 未结 6 1706
梦谈多话
梦谈多话 2021-01-01 13:56

Is there any difference in performance between this

synchronized void x() {
    y();
}

synchronized void y() {
}

and this

         


        
相关标签:
6条回答
  • 2021-01-01 14:23

    Test can be found below ( You have to guess what some methods do but nothing complicated ) :

    It tests them with 100 threads each and starts counting the averages after 70% of them has completed ( as warmup ).

    It prints it out once at the end.

    public static final class Test {
            final int                      iterations     =     100;
            final int                      jiterations    = 1000000;
            final int                      count          = (int) (0.7 * iterations);
            final AtomicInteger            finishedSingle = new AtomicInteger(iterations);
            final AtomicInteger            finishedZynced = new AtomicInteger(iterations);
            final MovingAverage.Cumulative singleCum      = new MovingAverage.Cumulative();
            final MovingAverage.Cumulative zyncedCum      = new MovingAverage.Cumulative();
            final MovingAverage            singleConv     = new MovingAverage.Converging(0.5);
            final MovingAverage            zyncedConv     = new MovingAverage.Converging(0.5);
    
            // -----------------------------------------------------------
            // -----------------------------------------------------------
            public static void main(String[] args) {
                    final Test test = new Test();
    
                    for (int i = 0; i < test.iterations; i++) {
                            test.benchmark(i);
                    }
    
                    Threads.sleep(1000000);
            }
            // -----------------------------------------------------------
            // -----------------------------------------------------------
    
            void benchmark(int i) {
    
                    Threads.async(()->{
                            long start = System.nanoTime();
    
                            for (int j = 0; j < jiterations; j++) {
                                    a();
                            }
    
                            long elapsed = System.nanoTime() - start;
                            int v = this.finishedSingle.decrementAndGet();
                            if ( v <= count ) {
                                    singleCum.add (elapsed);
                                    singleConv.add(elapsed);
                            }
    
                            if ( v == 0 ) {
                                    System.out.println(elapsed);
                                    System.out.println("Single Cum:\t\t" + singleCum.val());
                                    System.out.println("Single Conv:\t" + singleConv.val());
                                    System.out.println();
    
                            }
                    });
    
                    Threads.async(()->{
    
                            long start = System.nanoTime();
                            for (int j = 0; j < jiterations; j++) {
                                    az();
                            }
    
                            long elapsed = System.nanoTime() - start;
    
                            int v = this.finishedZynced.decrementAndGet();
                            if ( v <= count ) {
                                    zyncedCum.add(elapsed);
                                    zyncedConv.add(elapsed);
                            }
    
                            if ( v == 0 ) {
                                    // Just to avoid the output not overlapping with the one above 
                                    Threads.sleep(500);
                                    System.out.println();
                                    System.out.println("Zynced Cum: \t"  + zyncedCum.val());
                                    System.out.println("Zynced Conv:\t" + zyncedConv.val());
                                    System.out.println();
                            }
                    });
    
            }                       
    
            synchronized void a() { b();  }
                         void b() { c();  }
                         void c() { d();  }
                         void d() { e();  }
                         void e() { f();  }
                         void f() { g();  }
                         void g() { h();  }
                         void h() { i();  }
                         void i() { }
    
            synchronized void az() { bz(); }
            synchronized void bz() { cz(); }
            synchronized void cz() { dz(); }
            synchronized void dz() { ez(); }
            synchronized void ez() { fz(); }
            synchronized void fz() { gz(); }
            synchronized void gz() { hz(); }
            synchronized void hz() { iz(); }
            synchronized void iz() {}
    }
    

    MovingAverage.Cumulative add is basically ( performed atomically ): average = (average * (n) + number) / (++n);

    MovingAverage.Converging you can look up but uses another formula.

    The results after a 50 second warmup:

    With: jiterations -> 1000000

    Zynced Cum:     3.2017985649516254E11
    Zynced Conv:    8.11945143126507E10
    
    Single Cum:     4.747368153507841E11
    Single Conv:    8.277793176290959E10
    

    That's nano seconds averages. That's really nothing and even shows that the zynced one takes less time.

    With: jiterations -> original * 10 (takes much longer time)

    Zynced Cum:     7.462005651190714E11
    Zynced Conv:    9.03751742946726E11
    
    Single Cum:     9.088230941676143E11
    Single Conv:    9.09877020004914E11
    

    As you can see the results show it's really not a big difference. The zynced one actually has lower average time for the last 30% completions.

    With one thread each (iterations = 1) and jiterations = original * 100;

    Zynced Cum:     6.9167088486E10
    Zynced Conv:    6.9167088486E10
    
    Single Cum:     6.9814404337E10
    Single Conv:    6.9814404337E10
    

    In a same thread environment ( removing Threads.async calls )

    With: jiterations -> original * 10

    Single Cum:     2.940499529542545E8
    Single Conv:    5.0342450600964054E7
    
    
    Zynced Cum:     1.1930525617915475E9
    Zynced Conv:    6.672312498662484E8
    

    The zynced one here seems to be slower. On an order of ~10. The reason for this could be due to the zynced one running after each time, who knows. No energy to try the reverse.

    Last test run with:

    public static final class Test {
            final int                      iterations     =     100;
            final int                      jiterations    = 10000000;
            final int                      count          = (int) (0.7 * iterations);
            final AtomicInteger            finishedSingle = new AtomicInteger(iterations);
            final AtomicInteger            finishedZynced = new AtomicInteger(iterations);
            final MovingAverage.Cumulative singleCum      = new MovingAverage.Cumulative();
            final MovingAverage.Cumulative zyncedCum      = new MovingAverage.Cumulative();
            final MovingAverage            singleConv     = new MovingAverage.Converging(0.5);
            final MovingAverage            zyncedConv     = new MovingAverage.Converging(0.5);
    
            // -----------------------------------------------------------
            // -----------------------------------------------------------
            public static void main(String[] args) {
                    final Test test = new Test();
    
                    for (int i = 0; i < test.iterations; i++) {
                            test.benchmark(i);
                    }
    
                    Threads.sleep(1000000);
            }
            // -----------------------------------------------------------
            // -----------------------------------------------------------
    
            void benchmark(int i) {
    
                            long start = System.nanoTime();
    
                            for (int j = 0; j < jiterations; j++) {
                                    a();
                            }
    
                            long elapsed = System.nanoTime() - start;
                            int s = this.finishedSingle.decrementAndGet();
                            if ( s <= count ) {
                                    singleCum.add (elapsed);
                                    singleConv.add(elapsed);
                            }
    
                            if ( s == 0 ) {
                                    System.out.println(elapsed);
                                    System.out.println("Single Cum:\t\t" + singleCum.val());
                                    System.out.println("Single Conv:\t" + singleConv.val());
                                    System.out.println();
    
                            }
    
    
                            long zstart = System.nanoTime();
                            for (int j = 0; j < jiterations; j++) {
                                    az();
                            }
    
                            long elapzed = System.nanoTime() - zstart;
    
                            int z = this.finishedZynced.decrementAndGet();
                            if ( z <= count ) {
                                    zyncedCum.add(elapzed);
                                    zyncedConv.add(elapzed);
                            }
    
                            if ( z == 0 ) {
                                    // Just to avoid the output not overlapping with the one above 
                                    Threads.sleep(500);
                                    System.out.println();
                                    System.out.println("Zynced Cum: \t"  + zyncedCum.val());
                                    System.out.println("Zynced Conv:\t" + zyncedConv.val());
                                    System.out.println();
                            }
    
            }                       
    
            synchronized void a() { b();  }
                         void b() { c();  }
                         void c() { d();  }
                         void d() { e();  }
                         void e() { f();  }
                         void f() { g();  }
                         void g() { h();  }
                         void h() { i();  }
                         void i() { }
    
            synchronized void az() { bz(); }
            synchronized void bz() { cz(); }
            synchronized void cz() { dz(); }
            synchronized void dz() { ez(); }
            synchronized void ez() { fz(); }
            synchronized void fz() { gz(); }
            synchronized void gz() { hz(); }
            synchronized void hz() { iz(); }
            synchronized void iz() {}
    }
    

    Conclusion, there really is no difference.

    0 讨论(0)
  • 2021-01-01 14:26

    Yes, there is an additional performance cost, unless and until the JVM inlines the call to y(), which a modern JIT compiler will do in fairly short order. First, consider the case you've presented in which y() is visible outside the class. In this case, the JVM must check on entering y() to ensure that it can enter the monitor on the object; this check will always succeed when the call is coming from x(), but it can't be skipped, because the call could be coming from a client outside the class. This additional check incurs a small cost.

    Additionally, consider the case in which y() is private. In this case, the compiler still does not optimize away the synchronization; see the following disassembly of an empty y():

    private synchronized void y();
      flags: ACC_PRIVATE, ACC_SYNCHRONIZED
      Code:
        stack=0, locals=1, args_size=1
           0: return
    

    According to the spec's definition of synchronized, each entrance into a synchronized block or method performs lock action on the object, and leaving performs an unlock action. No other thread can acquire that object's monitor until the lock counter goes down to zero. Presumably some sort of static analysis could demonstrate that a private synchronized method is only ever called from within other synchronized methods, but Java's multi-source-file support would make that fragile at best, even ignoring reflection. This means that the JVM must still increment the counter on entering y():

    Monitor entry on invocation of a synchronized method, and monitor exit on its return, are handled implicitly by the Java Virtual Machine's method invocation and return instructions, as if monitorenter and monitorexit were used.

    @AmolSonawane correctly notes that the JVM may optimize this code at runtime by performing lock coarsening, essentially inlining the y() method. In this case, after the JVM has decided to perform a JIT optimization, calls from x() to y() will not incur any additional performance overhead, but of course calls directly to y() from any other location will still need to acquire the monitor separately.

    0 讨论(0)
  • 2021-01-01 14:26

    In the case where both methods are synchronized, you would be locking monitor twice. So first approach would have additional overhead of lock again. But your JVM can reduce the cost of locking by lock coarsening and may in-line call to y().

    0 讨论(0)
  • 2021-01-01 14:30

    Why not test it!? I ran a quick benchmark. The benchmark() method is called in a loop for warm-up. This may not be super accurate but it does show some consistent interesting pattern.

    public class Test {
        public static void main(String[] args) {
    
            for (int i = 0; i < 100; i++) {
                System.out.println("+++++++++");
                benchMark();
            }
        }
    
        static void benchMark() {
            Test t = new Test();
            long start = System.nanoTime();
            for (int i = 0; i < 100; i++) {
                t.x();
            }
            System.out.println("Double sync:" + (System.nanoTime() - start) / 1e6);
    
            start = System.nanoTime();
            for (int i = 0; i < 100; i++) {
                t.x1();
            }
            System.out.println("Single sync:" + (System.nanoTime() - start) / 1e6);
        }
        synchronized void x() {
            y();
        }
        synchronized void y() {
        }
        synchronized void x1() {
            y1();
        }
        void y1() {
        }
    }
    

    Results (last 10)

    +++++++++
    Double sync:0.021686
    Single sync:0.017861
    +++++++++
    Double sync:0.021447
    Single sync:0.017929
    +++++++++
    Double sync:0.021608
    Single sync:0.016563
    +++++++++
    Double sync:0.022007
    Single sync:0.017681
    +++++++++
    Double sync:0.021454
    Single sync:0.017684
    +++++++++
    Double sync:0.020821
    Single sync:0.017776
    +++++++++
    Double sync:0.021107
    Single sync:0.017662
    +++++++++
    Double sync:0.020832
    Single sync:0.017982
    +++++++++
    Double sync:0.021001
    Single sync:0.017615
    +++++++++
    Double sync:0.042347
    Single sync:0.023859
    

    Looks like the second variation is indeed slightly faster.

    0 讨论(0)
  • 2021-01-01 14:31

    No difference will be there. Since threads content only to acquire lock at x(). Thread that acquired lock at x() can acquire lock at y() without any contention(Because that is only thread that can reach that point at one particular time). So placing synchronized over there has no effect.

    0 讨论(0)
  • 2021-01-01 14:40

    Results of a micro benchmark run with jmh

    Benchmark                      Mean     Mean error    Units
    c.a.p.SO18996783.syncOnce      21.003        0.091  nsec/op
    c.a.p.SO18996783.syncTwice     20.937        0.108  nsec/op
    

    => no statistical difference.

    Looking at the generated assembly shows that lock coarsening has been performed and y_sync has been inlined in x_sync although it is synchronized.

    Full results:

    Benchmarks: 
    # Running: com.assylias.performance.SO18996783.syncOnce
    Iteration   1 (5000ms in 1 thread): 21.049 nsec/op
    Iteration   2 (5000ms in 1 thread): 21.052 nsec/op
    Iteration   3 (5000ms in 1 thread): 20.959 nsec/op
    Iteration   4 (5000ms in 1 thread): 20.977 nsec/op
    Iteration   5 (5000ms in 1 thread): 20.977 nsec/op
    
    Run result "syncOnce": 21.003 ±(95%) 0.055 ±(99%) 0.091 nsec/op
    Run statistics "syncOnce": min = 20.959, avg = 21.003, max = 21.052, stdev = 0.044
    Run confidence intervals "syncOnce": 95% [20.948, 21.058], 99% [20.912, 21.094]
    
    Benchmarks: 
    com.assylias.performance.SO18996783.syncTwice
    Iteration   1 (5000ms in 1 thread): 21.006 nsec/op
    Iteration   2 (5000ms in 1 thread): 20.954 nsec/op
    Iteration   3 (5000ms in 1 thread): 20.953 nsec/op
    Iteration   4 (5000ms in 1 thread): 20.869 nsec/op
    Iteration   5 (5000ms in 1 thread): 20.903 nsec/op
    
    Run result "syncTwice": 20.937 ±(95%) 0.065 ±(99%) 0.108 nsec/op
    Run statistics "syncTwice": min = 20.869, avg = 20.937, max = 21.006, stdev = 0.052
    Run confidence intervals "syncTwice": 95% [20.872, 21.002], 99% [20.829, 21.045]
    
    0 讨论(0)
提交回复
热议问题