So I\'m aware that nothing is atomic in C++. But I\'m trying to figure out if there are any \"pseudo-atomic\" assumptions I can make. The reason is that I want to avoid using
There are no threads in standard C++, and Threads cannot be implemented as a library.
Therefore, the standard has nothing to say about the behaviour of programs which use threads. You must look to whatever additional guarantees are provided by your threading implementation.
That said, in threading implementations I've used:
(1) yes, you can assume that irrelevant values aren't written to variables. Otherwise the whole memory model goes out the window. But be careful that when you say "another thread" never sets b
to false, that means anywhere, ever. If it does, that write could perhaps be re-ordered to occur during your loop.
(2) no, the compiler can re-order the assignments to b1 and b2, so it is possible for b1 to end up true and b2 false. In such a simple case I don't know why it would re-order, but in more complex cases there might be very good reasons.
[Edit: oops, by the time I got to answering (2) I'd forgotten that b was volatile. Reads from a volatile variable won't be re-ordered, sorry, so yes on a typical threading implementation (if there is any such thing), you can assume that you won't end up with b1 true and b2 false.]
(3) same as 1. volatile
in general has nothing to do with threading at all. However, it is quite exciting in some implementations (Windows), and might in effect imply memory barriers.
(4) on an architecture where int
writes are atomic yes, although volatile
has nothing to do with it. See also...
(5) check the docs carefully. Likely yes, and again volatile is irrelevant, because on almost all architectures int
writes are atomic. But if int
write is not atomic, then no (and no for the previous question), even if it's volatile you could in principle get a different value. Given those values 7 and 8, though, we're talking a pretty weird architecture for the byte containing the relevant bits to be written in two stages, but with different values you could more plausibly get a partial write.
For a more plausible example, suppose that for some bizarre reason you have a 16 bit int on a platform where only 8bit writes are atomic. Odd, but legal, and since int
must be at least 16 bits you can see how it could come about. Suppose further that your initial value is 255. Then increment could legally be implemented as:
A read-only thread which interrupted the incrementing thread between the third and fourth steps of that, could see the value 511. If the writes are in the other order, it could see 0.
An inconsistent value could be left behind permanently if one thread is writing 255, another thread is concurrently writing 256, and the writes get interleaved. Impossible on many architectures, but to know that this won't happen you need to know at least something about the architecture. Nothing in the C++ standard forbids it, because the C++ standard talks about execution being interrupted by a signal, but otherwise has no concept of execution being interrupted by another part of the program, and no concept of concurrent execution. That's why threads aren't just another library - adding threads fundamentally changes the C++ execution model. It requires the implementation to do things differently, as you'll eventually discover if for example you use threads under gcc and forget to specify -pthreads
.
The same could happen on a platform where aligned int
writes are atomic, but unaligned int
writes are permitted and not atomic. For example IIRC on x86, unaligned int
writes are not guaranteed atomic if they cross a cache line boundary. x86 compilers will not mis-align a declared int
variable, for this reason and others. But if you play games with structure packing you could probably provoke an example.
So: pretty much any implementation will give you the guarantees you need, but might do so in quite a complicated way.
In general, I've found that it is not worth trying to rely on platform-specific guarantees about memory access, that I don't fully understand, in order to avoid mutexes. Use a mutex, and if that's too slow use a high-quality lock-free structure (or implement a design for one) written by someone who really knows the architecture and compiler. It will probably be correct, and subject to correctness will probably outperform anything I invent myself.
If your C++ implementation supplies the library of atomic operations specified by n2145 or some variant thereof, you can presumably rely on it. Otherwise, you cannot in general rely on "anything" about atomicity at the language level, since multitasking of any kind (and therefore atomicity, which deals with multitasking) is not specified by the existing C++ standard.
It's generally a really, really bad idea to depend on this, as you could end up with bad things happening and only one some architectures. The best solution would be to use a guaranteed atomic API, for example the Windows Interlocked api.
My answer is going to be frustrating: No, No, No, No, and No.
1-4) The compiler is allowed to do ANYTHING it pleases with a variable it writes to. It may store temporary values in it, so long as ends up doing something that would do the same thing as that thread executing in a vacuum. ANYTHING is valid
5) Nope, no guarantee. If a variable is not atomic, and you write to it on one thread, and read or write to it on another, it is a race case. The spec declares such race cases to be undefined behavior, and absolutely anything goes. That being said, you will be hard pressed to find a compiler that does not give you 7 or 8, but it IS legal for a compiler to give you something else.
I always refer to this highly comical explanation of race cases.
http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
May be this thread is ancient, but the C++ 11 standard DOES have a thread library and also a vast atomic library for atomic operations. The purpose is specifically for concurrency support and avoid data races. The relevant header is atomic
Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.
Consider an example taken from this post:
volatile int ready;
int message[100];
void foo(int i)
{
message[i/10] = 42;
ready = 1;
}
At -O2
and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready
first, so it can be overlapped with computation of i/10
(volatile
does not save you!):
leaq _message(%rip), %rax
movl $1, _ready(%rip) ; <-- whoa Nelly!
movq %rsp, %rbp
sarl $2, %edx
subl %edi, %edx
movslq %edx,%rdx
movl $42, (%rax,%rdx,4)
This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready
before accessing the contents of message
then you have a nasty and obscure race.
Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:
// load with 'consume' (data-dependent) memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T>
T load_consume(T const* addr)
{
T v = *const_cast<T const volatile*>(addr);
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
return v;
}
// store with 'release' memory ordering
// NOTE: x86 specific, other platforms may need additional memory barriers
template<typename T>
void store_release(T* addr, T v)
{
__asm__ __volatile__ ("" ::: "memory"); // compiler barrier
*const_cast<T volatile*>(addr) = v;
}
I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic
check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.
Besides correctness, these libraries can simplify your code and clarify your intent:
// thread 1
std::atomic<int> foo; // or tbb::atomic, boost::atomic, etc
foo.store(1, std::memory_order_release);
// thread 2
int tmp = foo.load(std::memory_order_acquire);
Using explicit memory ordering, foo
's inter-thread relationship is clear.