问题
I have been stepping through the function calls that are involved when I assign to an atomic_long
type on VS2017 with a 64bit project. I specifically wanted to see what happens when I copy an atomic_long
into a none-atomic variable, and if there is any locking around it.
atomic_long ll = 10;
long t2 = ll;
Ultimately it ends up with this call (I've removed some code that was ifdef
ed out)
inline _Uint4_t _Load_seq_cst_4(volatile _Uint4_t *_Tgt)
{ /* load from *_Tgt atomically with
sequentially consistent memory order */
_Uint4_t _Value;
_Value = *_Tgt;
_Compiler_barrier();
return (_Value);
}
Now, I've read from MSDN that a plain read of a 32bit value will be atomic:
Simple reads and writes to properly-aligned 32-bit variables are atomic operations.
...which explains why there is no Interlocked
function for just reading; only those for changing/comparing. What I'd like to know is what the _Compiler_barrier()
bit is doing. This is #define
d as
__MACHINE(void _ReadWriteBarrier(void))
...and I've found on MSDN again that this
Limits the compiler optimizations that can reorder memory accesses across the point of the call.
But I don't get this, as there are no other memory accesses apart from the return
call; surely the compiler wouldn't move the assignment below that would it?
Can someone please clarify the purpose of this barrier?
回答1:
_Load_seq_cst_4
is an inline
function. The compiler barrier is there to block reordering with later code in the calling function this inlines into.
For example, consider reading a SeqLock. (Over-simplified from this actual implementation).
#include <atomic>
atomic<unsigned> sequence;
atomic_long value;
long seqlock_try_read() {
// this would normally be the body of a retry-loop;
unsigned seq1 = sequence;
long tmpval = value;
unsigned seq2 = sequence;
if (seq1 == seq2 && (seq1 & 1 == 0)
return tmpval;
else
// writer was modifying it, we should retry the loop
}
If we didn't block compile-time reordering, the compiler could merge both reads of sequence
into a single access, like perhaps like this
long tmpval = value;
unsigned seq1 = sequence;
unsigned seq2 = sequence;
This would defeat the locking mechanism (where the writer increments sequence
once before modifying the data, then again when it's done). Readers are entirely lockless, but it's not a "lock-free" algo because if the writer gets stuck mid-update, the readers can't read anything.
The barrier within each load
function blocks reordering with other things after inlining.
(The C++11 memory model is very weak, but the x86 memory model is strong, only allowing StoreLoad reordering. Blocking compile-time reordering with later loads/stores is sufficient to give you an acquire / sequential-consistency load at runtime. x86: Are memory barriers needed here?)
BTW, a better example might be something where some non-atomic
variables are read/written after seeing a certain value in an atomic
flag. MSVC probably already avoids reordering or merging of atomic accesses, and in the seqlock the data being protected also has to be atomic
.
Why don't compilers merge redundant std::atomic writes?
来源:https://stackoverflow.com/questions/49986209/purpose-of-compiler-barrier-on-32bit-read