In the documentation of std::memory_order on cppreference.com there is an example of relaxed ordering:
Relaxed ordering
Atomic operations tagged
It is sometimes possible for an action to be ordered relative to two other sequences of actions, without implying any relative ordering of the actions in those sequences relative to each other.
Suppose, for example, that one has the following three events:
and the read of p2 is independently ordered after the write of p1 and before the write of p3, but there is no particular ordering in which both p1 and p3 partake. Depending upon what is done with p2, it may be impractical for a compiler to defer p1 past p3 and still achieve the required semantics with p2. Suppose, however, the compiler knew that the above code was part of a larger sequence:
In that case, it could determine that it could reorder the store to p1 after the above code and consolidate it with the following store, thus resulting in code that writes p3 without writing p1 first:
Although it may seem that data dependencies would cause certain parts of sequencing relations to behave transitively, a compiler may identify situations where apparent data dependencies don't exist, and would thus not have the transitive effects one would expect.
I believe cppreference is right. I think this boils down to the "as-if" rule [intro.execution]/1. Compilers are only bound to reproduce the observable behavior of the program described by your code. A sequenced-before relation is only established between evaluations from the perspective of the thread in which these evaluations are performed [intro.execution]/15. That means when two evaluations sequenced one after the other appear somewhere in some thread, the code actually running in that thread must behave as if whatever the first evaluation does did indeed affect whatever the second evaluation does. For example
int x = 0;
x = 42;
std::cout << x;
must print 42. However, the compiler doesn't actually have to store the value 42 into an object x
before reading the value back from that object to print it. It may as well remember that the last value to be stored in x
was 42 and then simply print the value 42 directly before doing an actual store of the value 42 to x
. In fact, if x
is a local variable, it may as well just track what value that variable was last assigned at any point and never even create an object or actually store the value 42. There is no way for the thread to tell the difference. The behavior is always going to be as if there was a variable and as if the value 42 were actually stored in an object x
before being loaded from that object. But that doesn't mean that the generated machine code has to actually store and load anything anywhere ever. All that is required is that the observable behavior of the generated machine code is indistinguishable from what the behavior would be if all these things were to actually happen.
If we look at
r2 = x.load(std::memory_order_relaxed); // C
y.store(42, std::memory_order_relaxed); // D
then yes, C is sequenced before D. But when viewed from this thread in isolation, nothing that C does affects the outcome of D. And nothing that D does would change the result of C. The only way one could affect the other would be as an indirect consequence of something happening in another thread. However, by specifying std::memory_order_relaxed
, you explicitly stated that the order in which the load and store are observed by another thread is irrelevant. Since no other thread can observe the load and store in any particular order, there is nothing another thread could do to make C and D affect each other in a consistent manner. Thus, the order in which the load and store are actually performed is irrelevant. Thus, the compiler is free to reorder them. And, as mentioned in the explanation underneath that example, if the store from D is performed before the load from C, then r1 == r2 == 42 can indeed come about…
If there are two statements, the compiler will generate code in sequential order so code for the first one will be placed prior to the second one. But cpus internally have pipelines and are able to run assembly operations in parallel. Statement C is a load instruction. While memory is being fetched the pipeline will process the next few instructions and given they are not dependent on the load instruction they could end up being executed prior to C being finished (e.g. data for D was in cache, C in main memory).
If the user really needed the two statements to be executed sequentially, stricter memory ordering operations can be used. In general users don't care as long as the program is logically correct.