Is the explanation of relaxed ordering erroneous in cppreference?

问题

In the documentation of std::memory_order on cppreference.com there is an example of relaxed ordering:

Relaxed ordering

Atomic operations tagged memory_order_relaxed are not synchronization operations; they do not impose an order among concurrent memory accesses. They only guarantee atomicity and modification order consistency.

For example, with x and y initially zero,
// Thread 1:
r1 = y.load(std::memory_order_relaxed); // A
x.store(r1, std::memory_order_relaxed); // B
// Thread 2:
r2 = x.load(std::memory_order_relaxed); // C
y.store(42, std::memory_order_relaxed); // D
is allowed to produce r1 == r2 == 42 because, although A is sequenced-before B within thread 1 and C is sequenced before D within thread 2, nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x. The side-effect of D on y could be visible to the load A in thread 1 while the side effect of B on x could be visible to the load C in thread 2. In particular, this may occur if D is completed before C in thread 2, either due to compiler reordering or at runtime.

it says "C is sequenced before D within thread 2".

According to the definition of sequenced-before, which can be found in Order of evaluation, if A is sequenced before B, then evaluation of A will be completed before evaluation of B begins. Since C is sequenced before D within thread 2, C must be completed before D begins, hence the condition part of the last sentence of the snapshot will never be satisfied.

回答1:

I believe cppreference is right. I think this boils down to the "as-if" rule [intro.execution]/1. Compilers are only bound to reproduce the observable behavior of the program described by your code. A sequenced-before relation is only established between evaluations from the perspective of the thread in which these evaluations are performed [intro.execution]/15. That means when two evaluations sequenced one after the other appear somewhere in some thread, the code actually running in that thread must behave as if whatever the first evaluation does did indeed affect whatever the second evaluation does. For example

int x = 0;
x = 42;
std::cout << x;

must print 42. However, the compiler doesn't actually have to store the value 42 into an object x before reading the value back from that object to print it. It may as well remember that the last value to be stored in x was 42 and then simply print the value 42 directly before doing an actual store of the value 42 to x. In fact, if x is a local variable, it may as well just track what value that variable was last assigned at any point and never even create an object or actually store the value 42. There is no way for the thread to tell the difference. The behavior is always going to be as if there was a variable and as if the value 42 were actually stored in an object x before being loaded from that object. But that doesn't mean that the generated machine code has to actually store and load anything anywhere ever. All that is required is that the observable behavior of the generated machine code is indistinguishable from what the behavior would be if all these things were to actually happen.

If we look at

r2 = x.load(std::memory_order_relaxed); // C
y.store(42, std::memory_order_relaxed); // D

then yes, C is sequenced before D. But when viewed from this thread in isolation, nothing that C does affects the outcome of D. And nothing that D does would change the result of C. The only way one could affect the other would be as an indirect consequence of something happening in another thread. However, by specifying std::memory_order_relaxed, you explicitly stated that the order in which the load and store are observed by another thread is irrelevant. Since no other thread can observe the load and store in any particular order, there is nothing another thread could do to make C and D affect each other in a consistent manner. Thus, the order in which the load and store are actually performed is irrelevant. Thus, the compiler is free to reorder them. And, as mentioned in the explanation underneath that example, if the store from D is performed before the load from C, then r1 == r2 == 42 can indeed come about…

回答2:

It is sometimes possible for an action to be ordered relative to two other sequences of actions, without implying any relative ordering of the actions in those sequences relative to each other.

Suppose, for example, that one has the following three events:

store 1 to p1
load p2 into temp
store 2 to p3

and the read of p2 is independently ordered after the write of p1 and before the write of p3, but there is no particular ordering in which both p1 and p3 partake. Depending upon what is done with p2, it may be impractical for a compiler to defer p1 past p3 and still achieve the required semantics with p2. Suppose, however, the compiler knew that the above code was part of a larger sequence:

store 1 to p2 [sequenced before the load of p2]
[do the above]
store 3 into p1 [sequenced after the other store to p1]

In that case, it could determine that it could reorder the store to p1 after the above code and consolidate it with the following store, thus resulting in code that writes p3 without writing p1 first:

set temp to 1
store temp to p2
store 2 to p3
store 3 to p1

Although it may seem that data dependencies would cause certain parts of sequencing relations to behave transitively, a compiler may identify situations where apparent data dependencies don't exist, and would thus not have the transitive effects one would expect.

回答3:

If there are two statements, the compiler will generate code in sequential order so code for the first one will be placed prior to the second one. But cpus internally have pipelines and are able to run assembly operations in parallel. Statement C is a load instruction. While memory is being fetched the pipeline will process the next few instructions and given they are not dependent on the load instruction they could end up being executed prior to C being finished (e.g. data for D was in cache, C in main memory).

If the user really needed the two statements to be executed sequentially, stricter memory ordering operations can be used. In general users don't care as long as the program is logically correct.