Can atomic ops based spin lock's Unlock directly set the lock flag to zero?

问题

Say for example, I have an exclusive atomic-ops-based spin lock implementation as below:

bool TryLock(volatile TInt32 * pFlag)
{
   return !(AtomicOps::Exchange32(pFlag, 1) == 1);
}

void Lock (volatile TInt32 * pFlag) 
{  
    while (AtomicOps::Exchange32(pFlag, 1) ==  1) {
        AtomicOps::ThreadYield();
    }
}

void    Unlock (volatile TInt32 * pFlag)
{
    *pFlag = 0; // is this ok? or here as well a atomicity is needed for load and store    
}

Where AtomicOps::Exchange32 is implemented on windows using InterlockedExchange and on linux using __atomic_exchange_n.

回答1:

In most cases, for releasing the resource, just resetting the lock to zero (as you do) is almost OK (e.g. on an Intel Core processor) but you need also to make sure that the compiler will not exchange instructions (see below, see also g-v's post). If you want to be rigorous (and portable), there are two things that need to be considered :

What the compiler does: It may exchange instructions for optimizing the code, and thus introduce some subtle bugs if it is not "aware" of the multithreaded nature of the code. To avoid that, it is possible to insert a compiler barrier.

What the processor does: Some processors (like Intel Itanium, used in professional servers, or ARM processors used in smart phones) have a so-called "relaxed memory model". In practice, it means that the processor may decide to change the order of the operations. Again, this can be avoided by using special instructions (load barrier and store barrier). For instance, in an ARM processor, the instruction DMB ensures that all store operations are completed before the next instruction (and it needs to be inserted in the function that releases a lock)

Conclusion: It is very tricky to make the code correct, if you have some compiler / OS support for these functionalities (e.g., stdatomics.h, or std::atomic in C++0x), it is much better to rely on them than writing your own (but sometimes you have no choice). In the specific case of standard Intel Core processor, I think that what you do is correct, provided you insert a compiler-barrier in the release operation (see g-v's post).

On compile-time versus run-time memory ordering, see: https://en.wikipedia.org/wiki/Memory_ordering

My code for some atomic / spinlocks implemented on different architectures: http://alice.loria.fr/software/geogram/doc/html/atomics_8h.html (but I'm unsure it's 100 % correct)

回答2:

You need two memory barriers in spinlock implementation:

"acquire barrier" or "import barrier" in TryLock() and Lock(). It forces operations issued while spinlock is acquired to be visible only after pFlag value is updated.
"release barrier" or "export barrier" in Unlock(). It forces operations issued until spinlock was released to be visible before pFlag value is updated.

You also need two compiler barriers for the same reasons.

See this article for details.

This approach is for generic case. On x86/64:

there are no separate acquire/release barriers, but only single full barrier (memory fence);
there is no need for memory barriers here at all, since this architecture is strongly ordered;
you still need compiler barriers.

More details are provided here.

Below is an example implementation using GCC atomic builtins. It will work for all architectures supported by GCC:

it will insert acquire/release memory barriers on architectures where they are required (or full barrier if acquire/release barriers are not supported but architecture is weakly ordered);
it will insert compiler barriers on all architectures.

Code:

bool TryLock(volatile bool* pFlag)
{
   // acquire memory barrier and compiler barrier
   return !__atomic_test_and_set(pFlag, __ATOMIC_ACQUIRE);
}

void Lock(volatile bool* pFlag) 
{  
    for (;;) {
        // acquire memory barrier and compiler barrier
        if (!__atomic_test_and_set(pFlag, __ATOMIC_ACQUIRE)) {
            return;
        }

        // relaxed waiting, usually no memory barriers (optional)
        while (__atomic_load_n(pFlag, __ATOMIC_RELAXED)) {
            CPU_RELAX();
        }
    }
}

void Unlock(volatile bool* pFlag)
{
    // release memory barrier and compiler barrier
    __atomic_clear(pFlag, __ATOMIC_RELEASE);
}

For "relaxed waiting" loop, see this and this questions.

See also Linux kernel memory barriers as a good reference.

In your implementation:

Lock() calls AtomicOps::Exchange32() which already includes compiler barrier and perhaps acquire or full memory barrier (we don't know because you didn't provide actual arguments to __atomic_exchange_n()).
Unlock() misses both memory and compiler barriers so it's broken.

Also consider using pthread_spin_lock() if it is an option.

来源：https://stackoverflow.com/questions/32658024/can-atomic-ops-based-spin-locks-unlock-directly-set-the-lock-flag-to-zero

标签

Linux

multithreading

atomicity

memory-barriers

spinlock