问题
As Anthony Williams said:
some_atomic.load(std::memory_order_acquire) does just drop through to a simple load instruction, and some_atomic.store(std::memory_order_release) drops through to a simple store instruction.
It is known that on x86 for the operations load()
and store()
memory barriers memory_order_consume, memory_order_acquire, memory_order_release, memory_order_acq_rel
does not require a processor instructions.
But on ARMv8 we known that here are memory barriers both for load()
and store()
:
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2
About different architectures of CPUs: http://g.oswego.edu/dl/jmm/cookbook.html
Next, but for the CAS-operation on x86, these two lines with different memory barriers are identical in Disassembly code (MSVS2012 x86_64):
a.compare_exchange_weak(temp, 4, std::memory_order_seq_cst, std::memory_order_seq_cst);
000000013FE71A2D mov ebx,dword ptr [temp]
000000013FE71A31 mov eax,ebx
000000013FE71A33 mov ecx,4
000000013FE71A38 lock cmpxchg dword ptr [temp],ecx
a.compare_exchange_weak(temp, 5, std::memory_order_relaxed, std::memory_order_relaxed);
000000013FE71A4D mov ecx,5
000000013FE71A52 mov eax,ebx
000000013FE71A54 lock cmpxchg dword ptr [temp],ecx
Disassembly code compiled by GCC 4.8.1 x86_64 - GDB:
a.compare_exchange_weak(temp, 4, std::memory_order_seq_cst, std::memory_order_seq_cst);
a.compare_exchange_weak(temp, 5, std::memory_order_relaxed, std::memory_order_relaxed);
0x4613b7 <+0x0027> mov 0x2c(%rsp),%eax
0x4613bb <+0x002b> mov $0x4,%edx
0x4613c0 <+0x0030> lock cmpxchg %edx,0x20(%rsp)
0x4613c6 <+0x0036> mov %eax,0x2c(%rsp)
0x4613ca <+0x003a> lock cmpxchg %edx,0x20(%rsp)
Is on x86/x86_64 platforms for any atomic CAS-operations, an example such like this atomic_val.compare_exchange_weak(temp, 1, std::memory_order_relaxed, std::memory_order_relaxed);
always satisfied with the ordering std::memory_order_seq_cst
?
And if the any CAS operation on the x86 always run with sequential consistency (std::memory_order_seq_cst
) regardless of barriers, then on the ARMv8 it is the same?
QUESTION: Should the order of std::memory_order_relaxed
for CAS
block memory bus on x86 or ARM?
ANSWER: On x86 any compare_exchange_weak()
operations with any std::memory_orders
(even std::memory_order_relaxed
) always translates to the LOCK CMPXCHG
with lock bus, to be really atomic, and have equal expensive to XCHG
- "the cmpxchg is just as expensive as the xchg instruction".
(An addition: XCHG
equal to LOCK XCHG
, but CMPXCHG
doesn't equal to LOCK CMPXCHG
(which is really atomic)
On ARM and PowerPC for any`compare_exchange_weak() for different std::memory_orders there are differents lock's processor instructions, through LL/SC.
Processor memory-barriers-instructions for x86(except CAS), ARM and PowerPC: http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
回答1:
You shouldn't worry about what instructions the compiler maps a given C11 construct to as this doesn't capture everything. Instead you need to develop code with respect to the guarantees of the C11 memory model. As the above comment notes, your compiler or future compilers are free to reorder relaxed memory operations as long as it doesn't violate the C11 memory model. It is also a worthwhile running your code through a tool like CDSChecker to see what behaviors are allowed under the memory model.
回答2:
x86 guarantees that loads following loads are ordered, and stores following stores are ordered. Given that CAS requires both loading and storing, all operations have to be ordered around it.
However, it is worth noting that, in the presence of multiple atomics with memory_order_relaxed, the compiler is allowed to reorder them. It cannot do so with memory_order_seq_cst.
回答3:
I think the compiler emits lock cmpxchg
even for memory_order_relaxed
because that's the only way to make sure the compare+exchange itself is actually atomic. Like artless_noise said in comments, other architectures can use a Load Linked / Store Conditional to implement compare_exchange_weak(...)
.
memory_order_relaxed
should still let the compiler hoist stores of other variables out of loops, and otherwise reorder memory access at compile time.
If there was a way to do it on x86 that wasn't also a full memory barrier, a good compiler would use it for memory_order_relaxed
.
来源:https://stackoverflow.com/questions/18577584/do-atomic-cas-operations-on-x86-64-and-arm-always-use-stdmemory-order-seq-cst