This question is a follow-up/clarification to this:
Does the MOV x86 instruction implement a C++11 memory_order_release atomic store?
This states the M
Refreshing the semantics of acquire and release (quoting cppreference rather than the standard, because it's what I have on hand - the standard is more...verbose, here):
memory_order_acquire: A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread
memory_order_release: A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable
This gives us four things to guarantee:
Reviewing the guarantees:
- Reads are not reordered with other reads.
- Writes are not reordered with older reads.
- Writes to memory are not reordered with other writes [..]
- Individual processors use the same ordering principles as in a single-processor system.
This is sufficient to satisfy the ordering guarantees.
For acquire ordering, consider a read of the atomic has occurred: for that thread, clearly any later read or write migrating before would violate the first or second bullet points, respectively.
For release ordering, consider a write of the atomic has occurred: for that thread, clearly any prior reads or write migrating after would violate the second or third bullet points, respectively.
The only thing left is to ensure that if a thread reads a released store, it will see all the other loads the writer thread had produced up to that point. This is where the other multi-processor guarantee is needed.
- Writes by a single processor are observed in the same order by all processors.
This is sufficient to satisfy acquire-release synchronization.
We've already established that when the release write occurs, all other writes prior to it will have also occurred. This bullet point then ensures that if another thread reads the released write, it will read all the writes the writer produced up to that point. (If it does not, then it would be observing that single processor's writes in a different order than the single processor, violating the bullet point.)