I am struggling with Section 5.1.2.4 of the C11 Standard, in particular the semantics of Release/Acquire. I note that https://preshing.com/20120913/acquire-and-release-semantic
x86's memory model is basically sequential-consistency plus a store buffer (with store forwarding). So every store is a release-store1. This is why only seq-cst stores need any special instructions. (C/C++11 atomics mappings to asm). Also, https://stackoverflow.com/tags/x86/info has some links to x86 docs, including a formal description of the x86-TSO memory model (basically unreadable for most humans; requires wading through a lot of definitions).
Since you're already reading Jeff Preshing's excellent series of articles, I'll point you at another one that goes into more detail: https://preshing.com/20120930/weak-vs-strong-memory-models/
The only reordering that's allowed on x86 is StoreLoad, not LoadStore, if we're talking in those terms. (Store forwarding can do extra fun stuff if a load only partially overlaps a store; Globally Invisible load instructions, although you'll never get that in compiler-generated code for stdatomic
.)
@EOF commented with the right quote from Intel's manual:
Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3 (3A, 3B, 3C & 3D): System Programming Guide, 8.2.3.3 Stores Are Not Reordered With Earlier Loads.
Footnote 1: ignoring weakly-ordered NT stores; this is why you normally sfence
after doing NT stores. C11 / C++11 implementations assume you aren't using NT stores. If you are, use _mm_sfence
before a release operation to make sure it respects your NT stores. (In general don't use _mm_mfence / _mm_sfence in other cases; usually you only need to block compile-time reordering. Or of course just use stdatomic.)