The C++11 standard defines a memory model (1.7, 1.10) which contains memory orderings, which are, roughly, \"sequentially-consistent\", \"acquire\", \"consume\", \"rele
I'd like to record a partial finding, even though it's not a real answer and doesn't mean that there won't be a big bounty for a proper answer.
After staring at 1.10 for a while, and in particular the very helpful note in paragraph 11, I think this isn't actually so hard. The big difference between synchronizes-with (henceforth: s/w) and dependency-ordered-before (dob) is that a happens-before relationship can be established by concatenating sequenced-before (s/b) and s/w arbitrarily, but not so for dob. Note one of the definitions for inter-thread happens before:
A
synchronizes-withX
andX
is sequenced beforeB
But the analogous statement for is missing!A
is dependency-ordered before X
So with release/acquire (i.e. s/w) we can order arbitrary events:
A1 s/b B1 Thread 1
s/w
C1 s/b D1 Thread 2
But now consider an arbitrary sequence of events like this:
A2 s/b B2 Thread 1
dob
C2 s/b D2 Thread 2
In this sequenece, it is still true that A2
happens-before C2
(because A2
is s/b B2
and B2
inter-thread happens before C2
on account of dob; but we could argue that you can never actually tell!). However, it is not true that A2
happens-before D2
. The events A2
and D2
are not ordered with respect to one another, unless it actually holds that C2
carries dependency to D2
. This is a stricter requirement, and absent that requirement, A2
-to-D2
cannot be ordered "across" the release/consume pair.
In other words, a release/consume pair only propagates an ordering of actions which carry a dependency from one to the next. Everything that's not dependent is not ordered across the release/consume pair.
Furthermore, note that the ordering is restored if we append a final, stronger release/acquire pair:
A2 s/b B2 Th 1
dob
C2 s/b D2 Th 2
s/w
E2 s/b F2 Th 3
Now, by the quoted rule, D2
inter-thread happens before F2
, and therefore so do C2
and B2
, and so A2
happens-before F2
. But note that there is still no ordering between A2
and D2
— the ordering is only between A2
and later events.
In summary and in closing, dependency carrying is a strict subset of general sequencing, and release/consume pairs provide an ordering only among actions that carry dependency. As long as no stronger ordering is required (e.g. by passing through a release/acquire pair), there is theoretically a potential for additional optimization, since everything that is not in the dependency chain may be reordered freely.
Maybe here is an example that makes sense?
std::atomic foo(0);
int x = 0;
void thread1()
{
x = 51;
foo.store(10, std::memory_order_release);
}
void thread2()
{
if (foo.load(std::memory_order_acquire) == 10)
{
assert(x == 51);
}
}
As written, the code is race-free and the assertion will hold, because the release/acquire pair orderes the store x = 51
before the load in the assertion. However, by changing "acquire" into "consume", this would no longer be true and the program would have a data race on x
, since x = 51
carries no dependency into the store to foo
. The optimization point is that this store can be reordered freely without concern to what foo
is doing, because there is no dependency.