I\'ve been reading about the new C++11 memory model and I\'ve come upon the std::kill_dependency
function (§29.3/14-15). I\'m struggling to understand why
My guess is that it enables this optimization.
r1 = x.load(memory_order_consume);
do_something_with(a[r1->index]);
In addition to the other answer, I will point out that Scott Meyers, one of the definitive leaders in the C++ community, bashed memory_order_consume pretty strongly. He basically said that he believed it had no place in the standard. He said there are two cases where memory_order_consume has any effect:
Yes, once again, the DEC Alpha finds its way into infamy by using an optimization not seen in any other chip until many years later on absurdly specialized machines.
The particular optimization is that those processors allow one to dereference a field before actually getting the address of that field (i.e. it can look up x->y BEFORE it even looks up x, using a predicted value of x). It then goes back and determines whether x was the value it expected it to be. On success, it saved time. On failure, it has to go back and get x->y again.
Memory_order_consume tells the compiler/architecture that these operations have to happen in order. However, in the most useful case, one will end up wanting to do (x->y.z), where z doesn't change. memory_order_consume would force the compiler to keep x y and z in order. kill_dependency(x->y).z tells the compiler/architecture that it may resume doing such nefarious reorderings.
99.999% of developers will probably never work on a platform where this feature is required (or has any effect at all).
The usual use case of kill_dependency
arises from the following. Suppose you want to do atomic updates to a nontrivial shared data structure. A typical way to do this is to nonatomically create some new data and to atomically swing a pointer from the data structure to the new data. Once you do this, you are not going to change the new data until you have swung the pointer away from it to something else (and waited for all readers to vacate). This paradigm is widely used, e.g. read-copy-update in the Linux kernel.
Now, suppose the reader reads the pointer, reads the new data, and comes back later and reads the pointer again, finding that the pointer hasn't changed. The hardware can't tell that the pointer hasn't been updated again, so by consume
semantics he can't use a cached copy of the data but has to read it again from memory. (Or to think of it another way, the hardware and compiler can't speculatively move the read of the data up before the read of the pointer.)
This is where kill_dependency
comes to the rescue. By wrapping the pointer in a kill_dependency
, you create a value that will no longer propagate dependency, allowing accesses through the pointer to use the cached copy of the new data.
The purpose of memory_order_consume is to ensure the compiler does not do certain unfortunate optimizations that may break lockless algorithms. For example, consider this code:
int t;
volatile int a, b;
t = *x;
a = t;
b = t;
A conforming compiler may transform this into:
a = *x;
b = *x;
Thus, a may not equal b. It may also do:
t2 = *x;
// use t2 somewhere
// later
t = *x;
a = t2;
b = t;
By using load(memory_order_consume)
, we require that uses of the value being loaded not be moved prior to the point of use. In other words,
t = x.load(memory_order_consume);
a = t;
b = t;
assert(a == b); // always true
The standard document considers a case where you may only be interested in ordering certain fields of a structure. The example is:
r1 = x.load(memory_order_consume);
r2 = r1->index;
do_something_with(a[std::kill_dependency(r2)]);
This instructs the compiler that it is allowed to, effectively, do this:
predicted_r2 = x->index; // unordered load
r1 = x; // ordered load
r2 = r1->index;
do_something_with(a[predicted_r2]); // may be faster than waiting for r2's value to be available
Or even this:
predicted_r2 = x->index; // unordered load
predicted_a = a[predicted_r2]; // get the CPU loading it early on
r1 = x; // ordered load
r2 = r1->index; // ordered load
do_something_with(predicted_a);
If the compiler knows that do_something_with
won't change the result of the loads for r1 or r2, then it can even hoist it all the way up:
do_something_with(a[x->index]); // completely unordered
r1 = x; // ordered
r2 = r1->index; // ordered
This allows the compiler a little more freedom in its optimization.