Purpose of _Compiler_barrier() on 32bit read

问题

I have been stepping through the function calls that are involved when I assign to an atomic_long type on VS2017 with a 64bit project. I specifically wanted to see what happens when I copy an atomic_long into a none-atomic variable, and if there is any locking around it.

atomic_long ll = 10;
long t2 = ll;

Ultimately it ends up with this call (I've removed some code that was ifdefed out)

inline _Uint4_t _Load_seq_cst_4(volatile _Uint4_t *_Tgt)
    {   /* load from *_Tgt atomically with
            sequentially consistent memory order */
    _Uint4_t _Value;

    _Value = *_Tgt;
    _Compiler_barrier();

    return (_Value);
    }

Now, I've read from MSDN that a plain read of a 32bit value will be atomic:

Simple reads and writes to properly-aligned 32-bit variables are atomic operations.

...which explains why there is no Interlocked function for just reading; only those for changing/comparing. What I'd like to know is what the _Compiler_barrier() bit is doing. This is #defined as

__MACHINE(void _ReadWriteBarrier(void))

...and I've found on MSDN again that this

Limits the compiler optimizations that can reorder memory accesses across the point of the call.

But I don't get this, as there are no other memory accesses apart from the return call; surely the compiler wouldn't move the assignment below that would it?

Can someone please clarify the purpose of this barrier?

回答1:

_Load_seq_cst_4 is an inline function. The compiler barrier is there to block reordering with later code in the calling function this inlines into.

For example, consider reading a SeqLock. (Over-simplified from this actual implementation).

#include <atomic>
atomic<unsigned> sequence;
atomic_long  value;

long seqlock_try_read() {
    // this would normally be the body of a retry-loop;
    unsigned seq1 = sequence;
    long tmpval = value;
    unsigned seq2 = sequence;

    if (seq1 == seq2 && (seq1 & 1 == 0)
        return tmpval;
    else
        // writer was modifying it, we should retry the loop
}

If we didn't block compile-time reordering, the compiler could merge both reads of sequence into a single access, like perhaps like this

    long tmpval = value;
    unsigned seq1 = sequence;
    unsigned seq2 = sequence;

This would defeat the locking mechanism (where the writer increments sequence once before modifying the data, then again when it's done). Readers are entirely lockless, but it's not a "lock-free" algo because if the writer gets stuck mid-update, the readers can't read anything.

The barrier within each load function blocks reordering with other things after inlining.

(The C++11 memory model is very weak, but the x86 memory model is strong, only allowing StoreLoad reordering. Blocking compile-time reordering with later loads/stores is sufficient to give you an acquire / sequential-consistency load at runtime. x86: Are memory barriers needed here?)

BTW, a better example might be something where some non-atomic variables are read/written after seeing a certain value in an atomic flag. MSVC probably already avoids reordering or merging of atomic accesses, and in the seqlock the data being protected also has to be atomic.

Why don't compilers merge redundant std::atomic writes?

来源：https://stackoverflow.com/questions/49986209/purpose-of-compiler-barrier-on-32bit-read

标签

c++11

visual-c++

stdatomic