Is it possible that a store with memory_order_relaxed never reaches other threads?

问题

Suppose I have a thread A that writes to an atomic_int x = 0;, using x.store(1, std::memory_order_relaxed);. Without any other synchronization methods, how long would it take before other threads can see this, using x.load(std::memory_order_relaxed);? Is it possible that the value written to x stays entirely thread-local given the current definition of the C/C++ memory model that the standard gives?

The practical case that I have at hand is where a thread B reads an atomic_bool frequently to check if it has to quit; Another thread, at some point, writes true to this bool and then calls join() on thread B. Clearly I do not mind to call join() before thread B can even see that the atomic_bool was set, nor do I mind when thread B already saw the change and exited execution before I call join(). But I am wondering: using memory_order_relaxed on both sides, is it possible to call join() and block "forever" because the change is never propagated to thread B?

Edit

I contacted Mark Batty (the brain behind mathematically verifying and subsequently fixing the C++ memory model requirements). Originally about something else (which turned out to be a known bug in cppmem and his thesis; so fortunately I didn't make a complete fool of myself, and took the opportunity to ask him about this too; his answer was:

Q: Can it theoretically be that such a store [memory_order_relaxed without (any following) release operation] never reaches the other thread?
Mark: Theoretically, yes, but I don't think that has been observed.
Q: In other words, do relaxed stores make no sense whatsoever unless you combine them with some release operation (and acquire on the other thread), assuming you want another thread to see it?
Mark: Nearly all of the use cases for them do use release and acquire, yes.

回答1:

This is what the standard says in 29.3.12:

Implementations should make atomic stores visible to atomic loads within a reasonable amount of time.

There is no guarantee a store will become visible in another thread, there is no guaranteed timing and there is no formal relationship with memory order.

Of course, on each regular architecture a store will become visible, but on rare platforms that do not support cache coherency, it may never become visible to a load.
In that case, you would have to reach for an atomic read-modify-write operation to get the latest value in the modification order.

回答2:

This is all the standard has to say on the matter, I believe:

[intro.multithread]/25 An implementation should ensure that the last value (in modification order) assigned by an atomic or synchronization operation will become visible to all other threads in a finite period of time.

回答3:

In practice

Without any other synchronization methods, how long would it take before other threads can see this, using x.load(std::memory_order_relaxed);?

No time. It's a normal write, it goes to the store buffer, so it will be available in the L1d cache in less time than a blink. But that's only when the assembly instruction is run.

Instructions can be reordered by the compiler, but no reasonable compiler would reorder atomic operation over arbitrarily long loops.

In theory

Q: Can it theoretically be that such a store [memory_order_relaxed without (any following) release operation] never reaches the other thread?

Mark: Theoretically, yes,

You should have asked him what would happen if the "following release fence" was added back. Or with atomic store release operation.

Why wouldn't these be reordered and delayed a loooong time? (so long that it seems like an eternity in practice)

Is it possible that the value written to x stays entirely thread-local given the current definition of the C/C++ memory model that the standard gives?

If an imaginary and especially perverse implementation wanted to delay the visibility of atomic operation, why would it do that only for relaxed operations? It could well do it for all atomic operations.

Or never run some threads.

Or run some threads so slowly that you would believe they aren't running.

来源：https://stackoverflow.com/questions/43749985/is-it-possible-that-a-store-with-memory-order-relaxed-never-reaches-other-thread

标签

c++

c++11

memory-barriers

relaxed-atomics