问题
Say for example, I have an exclusive atomic-ops-based spin lock implementation as below:
bool TryLock(volatile TInt32 * pFlag)
{
return !(AtomicOps::Exchange32(pFlag, 1) == 1);
}
void Lock (volatile TInt32 * pFlag)
{
while (AtomicOps::Exchange32(pFlag, 1) == 1) {
AtomicOps::ThreadYield();
}
}
void Unlock (volatile TInt32 * pFlag)
{
*pFlag = 0; // is this ok? or here as well a atomicity is needed for load and store
}
Where AtomicOps::Exchange32
is implemented on windows using InterlockedExchange
and on linux using __atomic_exchange_n
.
回答1:
In most cases, for releasing the resource, just resetting the lock to zero (as you do) is almost OK (e.g. on an Intel Core processor) but you need also to make sure that the compiler will not exchange instructions (see below, see also g-v's post). If you want to be rigorous (and portable), there are two things that need to be considered :
What the compiler does: It may exchange instructions for optimizing the code, and thus introduce some subtle bugs if it is not "aware" of the multithreaded nature of the code. To avoid that, it is possible to insert a compiler barrier.
What the processor does: Some processors (like Intel Itanium, used in professional servers, or ARM processors used in smart phones) have a so-called "relaxed memory model". In practice, it means that the processor may decide to change the order of the operations. Again, this can be avoided by using special instructions (load barrier and store barrier). For instance, in an ARM processor, the instruction DMB ensures that all store operations are completed before the next instruction (and it needs to be inserted in the function that releases a lock)
Conclusion: It is very tricky to make the code correct, if you have some compiler / OS support for these functionalities (e.g., stdatomics.h
, or std::atomic
in C++0x), it is much better to rely on them than writing your own (but sometimes you have no choice). In the specific case of standard Intel Core processor, I think that what you do is correct, provided you insert a compiler-barrier in the release operation (see g-v's post).
On compile-time versus run-time memory ordering, see: https://en.wikipedia.org/wiki/Memory_ordering
My code for some atomic / spinlocks implemented on different architectures: http://alice.loria.fr/software/geogram/doc/html/atomics_8h.html (but I'm unsure it's 100 % correct)
回答2:
You need two memory barriers in spinlock implementation:
- "acquire barrier" or "import barrier" in
TryLock()
andLock()
. It forces operations issued while spinlock is acquired to be visible only afterpFlag
value is updated. - "release barrier" or "export barrier" in
Unlock()
. It forces operations issued until spinlock was released to be visible beforepFlag
value is updated.
You also need two compiler barriers for the same reasons.
See this article for details.
This approach is for generic case. On x86/64
:
- there are no separate acquire/release barriers, but only single full barrier (memory fence);
- there is no need for memory barriers here at all, since this architecture is strongly ordered;
- you still need compiler barriers.
More details are provided here.
Below is an example implementation using GCC atomic builtins. It will work for all architectures supported by GCC:
- it will insert acquire/release memory barriers on architectures where they are required (or full barrier if acquire/release barriers are not supported but architecture is weakly ordered);
- it will insert compiler barriers on all architectures.
Code:
bool TryLock(volatile bool* pFlag)
{
// acquire memory barrier and compiler barrier
return !__atomic_test_and_set(pFlag, __ATOMIC_ACQUIRE);
}
void Lock(volatile bool* pFlag)
{
for (;;) {
// acquire memory barrier and compiler barrier
if (!__atomic_test_and_set(pFlag, __ATOMIC_ACQUIRE)) {
return;
}
// relaxed waiting, usually no memory barriers (optional)
while (__atomic_load_n(pFlag, __ATOMIC_RELAXED)) {
CPU_RELAX();
}
}
}
void Unlock(volatile bool* pFlag)
{
// release memory barrier and compiler barrier
__atomic_clear(pFlag, __ATOMIC_RELEASE);
}
For "relaxed waiting" loop, see this and this questions.
See also Linux kernel memory barriers as a good reference.
In your implementation:
Lock()
callsAtomicOps::Exchange32()
which already includes compiler barrier and perhaps acquire or full memory barrier (we don't know because you didn't provide actual arguments to__atomic_exchange_n()
).Unlock()
misses both memory and compiler barriers so it's broken.
Also consider using pthread_spin_lock() if it is an option.
来源:https://stackoverflow.com/questions/32658024/can-atomic-ops-based-spin-locks-unlock-directly-set-the-lock-flag-to-zero