Lock-free programming: reordering and memory order semantics

问题

I am trying to find my feet in lock-free programming. Having read different explanations for memory ordering semantics, I would like to clear up what possible reordering may happen. As far as I understood, instructions may be reordered by the compiler (due to optimization when the program is compiled) and CPU (at runtime?).

For the relaxed semantics cpp reference provides the following example:

// Thread 1:
r1 = y.load(memory_order_relaxed); // A
x.store(r1, memory_order_relaxed); // B
// Thread 2:
r2 = x.load(memory_order_relaxed); // C 
y.store(42, memory_order_relaxed); // D

It is said that with x and y initially zero the code is allowed to produce r1 == r2 == 42 because, although A is sequenced-before B within thread 1 and C is sequenced before D within thread 2, nothing prevents D from appearing before A in the modification order of y, and B from appearing before C in the modification order of x. How could that happen? Does it imply that C and D get reordered, so the execution order would be DABC? Is it allowed to reorder A and B?

For the acquire-release semantics there is the following sample code:

std::atomic<std::string*> ptr;
int data;

void producer()
{
    std::string* p  = new std::string("Hello");
    data = 42;
    ptr.store(p, std::memory_order_release);
}

void consumer()
{
    std::string* p2;
    while (!(p2 = ptr.load(std::memory_order_acquire)))
        ;
    assert(*p2 == "Hello"); // never fires
    assert(data == 42); // never fires
}

I'm wondering what if we used relaxed memory order instead of acquire? I guess, the value of data could be read before p2 = ptr.load(std::memory_order_relaxed), but what about p2?

Finally, why it is fine to use relaxed memory order in this case?

template<typename T>
class stack
{
    std::atomic<node<T>*> head;
 public:
    void push(const T& data)
    {
      node<T>* new_node = new node<T>(data);

      // put the current value of head into new_node->next
      new_node->next = head.load(std::memory_order_relaxed);

      // now make new_node the new head, but if the head
      // is no longer what's stored in new_node->next
      // (some other thread must have inserted a node just now)
      // then put that new head into new_node->next and try again
      while(!head.compare_exchange_weak(new_node->next, new_node,
                                        std::memory_order_release,
                                        std::memory_order_relaxed))
          ; // the body of the loop is empty
    }
};

I mean both head.load(std::memory_order_relaxed) and head.compare_exchange_weak(new_node->next, new_node, std::memory_order_release, std::memory_order_relaxed).

To summarize all the above, my question is essentially when do I have to care about potential reordering and when I don't?

回答1:

For #1, compiler may issue the store to y before the load from x (there are no dependencies), and even if it doesn't, the load from x can be delayed at cpu/memory level.

For #2, p2 would be nonzero, but neither *p2 nor data would necessarily have a meaningful value.

For #3 there is only one act of publishing non-atomic stores made by this thread, and it is a release

You should always care about reordering, or, better, not assume any order: neither C++ nor hardware executes code top to bottom, they only respect dependencies.

来源：https://stackoverflow.com/questions/40127280/lock-free-programming-reordering-and-memory-order-semantics

标签

multithreading

c++11

atomic

lock-free

memory-barriers