So I\'m aware that nothing is atomic in C++. But I\'m trying to figure out if there are any \"pseudo-atomic\" assumptions I can make. The reason is that I want to avoid using
Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.
Consider an example taken from this post:
volatile int ready;
int message[100];
void foo(int i)
{
message[i/10] = 42;
ready = 1;
}
At -O2
and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready
first, so it can be overlapped with computation of i/10
(volatile
does not save you!):
leaq _message(%rip), %rax
movl $1, _ready(%rip) ; <-- whoa Nelly!
movq %rsp, %rbp
sarl $2, %edx
subl %edi, %edx
movslq %edx,%rdx
movl $42, (%rax,%rdx,4)
This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready
before accessing the contents of message
then you have a nasty and obscure race.
Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:
// load with 'consume' (data-dependent) memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template
T load_consume(T const* addr)
{
T v = *const_cast(addr);
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
return v;
}
// store with 'release' memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template
void store_release(T* addr, T v)
{
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
*const_cast(addr) = v;
}
I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic
check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.
Besides correctness, these libraries can simplify your code and clarify your intent:
// thread 1
std::atomic foo; // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);
// thread 2
int tmp = foo.load(std::memory_order_acquire);
Using explicit memory ordering, foo
's inter-thread relationship is clear.