Is there any difference in performance between this
synchronized void x() {
y();
}
synchronized void y() {
}
and this
Test can be found below ( You have to guess what some methods do but nothing complicated ) :
It tests them with 100 threads each and starts counting the averages after 70% of them has completed ( as warmup ).
It prints it out once at the end.
public static final class Test {
final int iterations = 100;
final int jiterations = 1000000;
final int count = (int) (0.7 * iterations);
final AtomicInteger finishedSingle = new AtomicInteger(iterations);
final AtomicInteger finishedZynced = new AtomicInteger(iterations);
final MovingAverage.Cumulative singleCum = new MovingAverage.Cumulative();
final MovingAverage.Cumulative zyncedCum = new MovingAverage.Cumulative();
final MovingAverage singleConv = new MovingAverage.Converging(0.5);
final MovingAverage zyncedConv = new MovingAverage.Converging(0.5);
// -----------------------------------------------------------
// -----------------------------------------------------------
public static void main(String[] args) {
final Test test = new Test();
for (int i = 0; i < test.iterations; i++) {
test.benchmark(i);
}
Threads.sleep(1000000);
}
// -----------------------------------------------------------
// -----------------------------------------------------------
void benchmark(int i) {
Threads.async(()->{
long start = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
a();
}
long elapsed = System.nanoTime() - start;
int v = this.finishedSingle.decrementAndGet();
if ( v <= count ) {
singleCum.add (elapsed);
singleConv.add(elapsed);
}
if ( v == 0 ) {
System.out.println(elapsed);
System.out.println("Single Cum:\t\t" + singleCum.val());
System.out.println("Single Conv:\t" + singleConv.val());
System.out.println();
}
});
Threads.async(()->{
long start = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
az();
}
long elapsed = System.nanoTime() - start;
int v = this.finishedZynced.decrementAndGet();
if ( v <= count ) {
zyncedCum.add(elapsed);
zyncedConv.add(elapsed);
}
if ( v == 0 ) {
// Just to avoid the output not overlapping with the one above
Threads.sleep(500);
System.out.println();
System.out.println("Zynced Cum: \t" + zyncedCum.val());
System.out.println("Zynced Conv:\t" + zyncedConv.val());
System.out.println();
}
});
}
synchronized void a() { b(); }
void b() { c(); }
void c() { d(); }
void d() { e(); }
void e() { f(); }
void f() { g(); }
void g() { h(); }
void h() { i(); }
void i() { }
synchronized void az() { bz(); }
synchronized void bz() { cz(); }
synchronized void cz() { dz(); }
synchronized void dz() { ez(); }
synchronized void ez() { fz(); }
synchronized void fz() { gz(); }
synchronized void gz() { hz(); }
synchronized void hz() { iz(); }
synchronized void iz() {}
}
MovingAverage.Cumulative add is basically ( performed atomically ): average = (average * (n) + number) / (++n);
MovingAverage.Converging you can look up but uses another formula.
The results after a 50 second warmup:
With: jiterations -> 1000000
Zynced Cum: 3.2017985649516254E11
Zynced Conv: 8.11945143126507E10
Single Cum: 4.747368153507841E11
Single Conv: 8.277793176290959E10
That's nano seconds averages. That's really nothing and even shows that the zynced one takes less time.
With: jiterations -> original * 10 (takes much longer time)
Zynced Cum: 7.462005651190714E11
Zynced Conv: 9.03751742946726E11
Single Cum: 9.088230941676143E11
Single Conv: 9.09877020004914E11
As you can see the results show it's really not a big difference. The zynced one actually has lower average time for the last 30% completions.
With one thread each (iterations = 1) and jiterations = original * 100;
Zynced Cum: 6.9167088486E10
Zynced Conv: 6.9167088486E10
Single Cum: 6.9814404337E10
Single Conv: 6.9814404337E10
In a same thread environment ( removing Threads.async calls )
With: jiterations -> original * 10
Single Cum: 2.940499529542545E8
Single Conv: 5.0342450600964054E7
Zynced Cum: 1.1930525617915475E9
Zynced Conv: 6.672312498662484E8
The zynced one here seems to be slower. On an order of ~10. The reason for this could be due to the zynced one running after each time, who knows. No energy to try the reverse.
Last test run with:
public static final class Test {
final int iterations = 100;
final int jiterations = 10000000;
final int count = (int) (0.7 * iterations);
final AtomicInteger finishedSingle = new AtomicInteger(iterations);
final AtomicInteger finishedZynced = new AtomicInteger(iterations);
final MovingAverage.Cumulative singleCum = new MovingAverage.Cumulative();
final MovingAverage.Cumulative zyncedCum = new MovingAverage.Cumulative();
final MovingAverage singleConv = new MovingAverage.Converging(0.5);
final MovingAverage zyncedConv = new MovingAverage.Converging(0.5);
// -----------------------------------------------------------
// -----------------------------------------------------------
public static void main(String[] args) {
final Test test = new Test();
for (int i = 0; i < test.iterations; i++) {
test.benchmark(i);
}
Threads.sleep(1000000);
}
// -----------------------------------------------------------
// -----------------------------------------------------------
void benchmark(int i) {
long start = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
a();
}
long elapsed = System.nanoTime() - start;
int s = this.finishedSingle.decrementAndGet();
if ( s <= count ) {
singleCum.add (elapsed);
singleConv.add(elapsed);
}
if ( s == 0 ) {
System.out.println(elapsed);
System.out.println("Single Cum:\t\t" + singleCum.val());
System.out.println("Single Conv:\t" + singleConv.val());
System.out.println();
}
long zstart = System.nanoTime();
for (int j = 0; j < jiterations; j++) {
az();
}
long elapzed = System.nanoTime() - zstart;
int z = this.finishedZynced.decrementAndGet();
if ( z <= count ) {
zyncedCum.add(elapzed);
zyncedConv.add(elapzed);
}
if ( z == 0 ) {
// Just to avoid the output not overlapping with the one above
Threads.sleep(500);
System.out.println();
System.out.println("Zynced Cum: \t" + zyncedCum.val());
System.out.println("Zynced Conv:\t" + zyncedConv.val());
System.out.println();
}
}
synchronized void a() { b(); }
void b() { c(); }
void c() { d(); }
void d() { e(); }
void e() { f(); }
void f() { g(); }
void g() { h(); }
void h() { i(); }
void i() { }
synchronized void az() { bz(); }
synchronized void bz() { cz(); }
synchronized void cz() { dz(); }
synchronized void dz() { ez(); }
synchronized void ez() { fz(); }
synchronized void fz() { gz(); }
synchronized void gz() { hz(); }
synchronized void hz() { iz(); }
synchronized void iz() {}
}
Conclusion, there really is no difference.
Yes, there is an additional performance cost, unless and until the JVM inlines the call to y()
, which a modern JIT compiler will do in fairly short order. First, consider the case you've presented in which y()
is visible outside the class. In this case, the JVM must check on entering y()
to ensure that it can enter the monitor on the object; this check will always succeed when the call is coming from x()
, but it can't be skipped, because the call could be coming from a client outside the class. This additional check incurs a small cost.
Additionally, consider the case in which y()
is private
. In this case, the compiler still does not optimize away the synchronization; see the following disassembly of an empty y()
:
private synchronized void y();
flags: ACC_PRIVATE, ACC_SYNCHRONIZED
Code:
stack=0, locals=1, args_size=1
0: return
According to the spec's definition of synchronized, each entrance into a synchronized
block or method performs lock action on the object, and leaving performs an unlock action. No other thread can acquire that object's monitor until the lock counter goes down to zero. Presumably some sort of static analysis could demonstrate that a private synchronized
method is only ever called from within other synchronized
methods, but Java's multi-source-file support would make that fragile at best, even ignoring reflection. This means that the JVM must still increment the counter on entering y():
Monitor entry on invocation of a
synchronized
method, and monitor exit on its return, are handled implicitly by the Java Virtual Machine's method invocation and return instructions, as if monitorenter and monitorexit were used.
@AmolSonawane correctly notes that the JVM may optimize this code at runtime by performing lock coarsening, essentially inlining the y()
method. In this case, after the JVM has decided to perform a JIT optimization, calls from x()
to y()
will not incur any additional performance overhead, but of course calls directly to y()
from any other location will still need to acquire the monitor separately.
In the case where both methods are synchronized, you would be locking monitor twice. So first approach would have additional overhead of lock again. But your JVM can reduce the cost of locking by lock coarsening and may in-line call to y().
Why not test it!? I ran a quick benchmark. The benchmark()
method is called in a loop for warm-up. This may not be super accurate but it does show some consistent interesting pattern.
public class Test {
public static void main(String[] args) {
for (int i = 0; i < 100; i++) {
System.out.println("+++++++++");
benchMark();
}
}
static void benchMark() {
Test t = new Test();
long start = System.nanoTime();
for (int i = 0; i < 100; i++) {
t.x();
}
System.out.println("Double sync:" + (System.nanoTime() - start) / 1e6);
start = System.nanoTime();
for (int i = 0; i < 100; i++) {
t.x1();
}
System.out.println("Single sync:" + (System.nanoTime() - start) / 1e6);
}
synchronized void x() {
y();
}
synchronized void y() {
}
synchronized void x1() {
y1();
}
void y1() {
}
}
Results (last 10)
+++++++++
Double sync:0.021686
Single sync:0.017861
+++++++++
Double sync:0.021447
Single sync:0.017929
+++++++++
Double sync:0.021608
Single sync:0.016563
+++++++++
Double sync:0.022007
Single sync:0.017681
+++++++++
Double sync:0.021454
Single sync:0.017684
+++++++++
Double sync:0.020821
Single sync:0.017776
+++++++++
Double sync:0.021107
Single sync:0.017662
+++++++++
Double sync:0.020832
Single sync:0.017982
+++++++++
Double sync:0.021001
Single sync:0.017615
+++++++++
Double sync:0.042347
Single sync:0.023859
Looks like the second variation is indeed slightly faster.
No difference will be there. Since threads content only to acquire lock at x(). Thread that acquired lock at x() can acquire lock at y() without any contention(Because that is only thread that can reach that point at one particular time). So placing synchronized over there has no effect.
Results of a micro benchmark run with jmh
Benchmark Mean Mean error Units
c.a.p.SO18996783.syncOnce 21.003 0.091 nsec/op
c.a.p.SO18996783.syncTwice 20.937 0.108 nsec/op
=> no statistical difference.
Looking at the generated assembly shows that lock coarsening has been performed and y_sync
has been inlined in x_sync
although it is synchronized.
Full results:
Benchmarks:
# Running: com.assylias.performance.SO18996783.syncOnce
Iteration 1 (5000ms in 1 thread): 21.049 nsec/op
Iteration 2 (5000ms in 1 thread): 21.052 nsec/op
Iteration 3 (5000ms in 1 thread): 20.959 nsec/op
Iteration 4 (5000ms in 1 thread): 20.977 nsec/op
Iteration 5 (5000ms in 1 thread): 20.977 nsec/op
Run result "syncOnce": 21.003 ±(95%) 0.055 ±(99%) 0.091 nsec/op
Run statistics "syncOnce": min = 20.959, avg = 21.003, max = 21.052, stdev = 0.044
Run confidence intervals "syncOnce": 95% [20.948, 21.058], 99% [20.912, 21.094]
Benchmarks:
com.assylias.performance.SO18996783.syncTwice
Iteration 1 (5000ms in 1 thread): 21.006 nsec/op
Iteration 2 (5000ms in 1 thread): 20.954 nsec/op
Iteration 3 (5000ms in 1 thread): 20.953 nsec/op
Iteration 4 (5000ms in 1 thread): 20.869 nsec/op
Iteration 5 (5000ms in 1 thread): 20.903 nsec/op
Run result "syncTwice": 20.937 ±(95%) 0.065 ±(99%) 0.108 nsec/op
Run statistics "syncTwice": min = 20.869, avg = 20.937, max = 21.006, stdev = 0.052
Run confidence intervals "syncTwice": 95% [20.872, 21.002], 99% [20.829, 21.045]