double check locking without volatile (but with VarHandle release/acquire)

问题

The question is rather easy, in a way. Suppose I have this class:

static class Singleton {

}

And I want to provide a singleton factory for it. I can do the (probably) obvious. I am not going to mention the enum possibility or any other, as they are of no interest to me.

static final class SingletonFactory {

    private static volatile Singleton singleton;

    public static Singleton getSingleton() {
        if (singleton == null) { // volatile read
            synchronized (SingletonFactory.class) {
                if (singleton == null) { // volatile read
                    singleton = new Singleton(); // volatile write
                }
            }
        }
        return singleton; // volatile read
    }
}

I can get away from one volatile read with the price of higher code complexity:

public static Singleton improvedGetSingleton() {
    Singleton local = singleton; // volatile read
    if (local == null) {
        synchronized (SingletonFactory.class) {
           local = singleton; // volatile read
           if (local == null) {
               local = new Singleton();
               singleton = local; // volatile write
           }
        }
    }

    return local; // NON volatile read
}

This is pretty much what our code has been using for close to a decade now.

The question is can I make this even faster with release/acquire semantics added in java-9 via VarHandle:

static final class SingletonFactory {

    private static final SingletonFactory FACTORY = new SingletonFactory();

    private Singleton singleton;

    private static final VarHandle VAR_HANDLE;

    static {
        try {
            VAR_HANDLE = MethodHandles.lookup().findVarHandle(SingletonFactory.class, "singleton", Singleton.class);
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    private static Singleton getInnerSingleton() {

        Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY); // acquire

        if (localSingleton == null) {
            synchronized (SingletonFactory.class) {
                localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY); // acquire
                if (localSingleton == null) {
                    localSingleton = new Singleton();
                    VAR_HANDLE.setRelease(FACTORY, localSingleton); // release
                }
            }
        }

        return localSingleton;
    }
    
}

Would this be a valid and correct implementation?

回答1:

Yes, this is correct, and it is present on Wikipedia. (It doesn't matter that the field is volatile, since it is only ever accessed from VarHandle.)

If the first read sees a stale value, it enters the synchronized block. Since synchronized blocks involve happen-before relationships, the second read will always see the written value. Even on Wikipedia it says sequential consistency is lost, but it refers to the fields; synchronized blocks are sequentially consistent, even though they use release-acquire semantics.

So the second null check will never succeed, and the object is never instantiated twice.

It is guaranteed that the second read will see the written value, because it is executed with the same lock held as when the value was computed and stored in the variable.

On x86 all loads have acquire semantics, so the only overhead would be the null check. Release-acquire allows values to be seen eventually (that's why the relevant method was called lazySet before Java 9, and its Javadoc used that exact same word). This is prevented in this scenario by the synchronized block.

Instructions may not be reordered out and into synchronized blocks.

回答2:

I am going to try and answer this myself... TL;DR : This is a correct implementation, but potentially more expensive than the one with volatile?.

Though this looks better, it can under-perform in some case. I am going to push myself against the famous IRIW example : independent reads of independent writes:

                        volatile x, y
     -----------------------------------------------------
     x = 1  |  y = 1   |     int r1 = x   |    int r3 = y
            |          |     int r2 = y   |    int r4 = x

This reads as :

there are two threads (ThreadA and ThreadB) that write to x and y (x = 1 and y = 1)
there are two more threads (ThreadC and ThreadD) that read x and y, but in reverse order.

Because x and y are volatile a result as below is impossible:

 r1 = 1 (x)      r3 = 1 (y)
 r2 = 0 (y)      r4 = 0 (x)

This is what sequential consistency of volatile guarantees. If ThreadC observed the write to x (it saw that x = 1), it means that ThreadD MUST observe the same x = 1. This is because in a sequential consistent execution writes happens as-if in global order, or it happens as-if atomically, everywhere. So every single thread must see the same value. So this execution is impossible, according to the JLS too:

If a program has no data races, then all executions of the program will appear to be sequentially consistent.

Now if we move the same example to release/acquire (x = 1 and y = 1 are releases while the other reads are acquires):

                       non-volatile x, y
     -----------------------------------------------------
     x = 1  |  y = 1   |     int r1 = x   |    int r3 = y
            |          |     int r2 = y   |    int r4 = x

A result like:

r1 = 1 (x)      r3 = 1 (y)
r2 = 0 (y)      r4 = 0 (x)

is possible and allowed. This breaks sequential consistency and this is normal, since release/acquire is "weaker". For x86 release/acquire does not impose a StoreLoad barrier , so an acquire is allowed to go above (reorder) an release (unlike volatile which prohibits this). In simpler words volatiles themselves are not allowed to be re-ordered, while a chain like:

 release ... // (STORE)
 acquire ... // this acquire (LOAD) can float ABOVE the release

is allowed to be "inverted" (reordered), since StoreLoad is not mandatory.

Though this is somehow wrong and irrelevant, because JLS does not explain things with barriers. Unfortunately, these are not yet documented in the JLS either...

If I extrapolate this to the example of SingletonFactory, it means that after a release :

 VAR_HANDLE.setRelease(FACTORY, localSingleton);

any other thread that does an acquire:

Singleton localSingleton = (Singleton) VAR_HANDLE.getAcquire(FACTORY);

is not guaranteed to read the value from the release (a non-null Singleton).

Think about it: in case of volatile, if one thread has seen the volatile write, every other thread will, for sure, see it too. There is no such guarantee with release/acquire.

As such, with release/acquire every thread might need to enter the synchronized block. And this might happen for many threads, because it's really unknown when the store that happened in the release will be visible by the load acquire.

And even if the synchronized itself does offer happens-before order, this code, at least for some time (until the release is observed) is going to perform worse? (I assume so): every thread competing to enter the synchronized block.

So in the end - this is about what is more expensive? A volatile store or an eventually seen release. I have no answer to this one.

来源：https://stackoverflow.com/questions/65171886/double-check-locking-without-volatile-but-with-varhandle-release-acquire

标签

java

concurrency

volatile

double-checked-locking