Atomicity of 32bit read on multicore CPU

后端 未结 4 849
温柔的废话
温柔的废话 2021-01-18 09:46

(Note: I\'ve added tags to this question based on where I feel will people will be who are likely to be able to help, so please don\'t shout:))

In my VS 2017 64bit p

相关标签:
4条回答
  • 2021-01-18 10:07

    As I posted here, this question was never about protecting a critical section of code, it was purely about avoiding torn read/writes. user3386109 posted a comment here which I ended up using, but declined posting it as an answer here. Thus I am providing the solution I ended up using for completeness of this question; maybe it will help someone in the future.

    The following shows the atomic setting and testing of m_lClosed:

    long m_lClosed = 0;
    

    Thread 1

    // Set flag to closed
    if (InterlockedCompareExchange(&m_lClosed, 1, 0) == 0)
        cout << "Closed OK!\n";
    

    Thread 2

    This code replaces if (!m_lClosed)

    if (InterlockedCompareExchange(&m_lClosed, 0, 0) == 0)
        cout << "Not closed!";
    
    0 讨论(0)
  • 2021-01-18 10:11

    In C++11, an unsynchronized access to a non-atomic object (such as m_lClosed) is undefined behavior.

    The standard provides all the facilities you need to write this correctly; you do not need non-portable functions such as InterlockedCompareExchange. Instead, simply define your variable as atomic:

    std::atomic<bool> m_lClosed{false};
    
    // Writer thread...
    bool expected = false;
    m_lClosed.compare_exhange_strong(expected, true);
    
    // Reader...
    if (m_lClosed.load()) { /* ... */ }
    

    This is more than sufficient (it forces sequential consistency, which might be expensive). In some cases it might be possible to generate slightly faster code by relaxing the memory order on the atomic operations, but I would not worry about that.

    0 讨论(0)
  • 2021-01-18 10:16

    OK so as it turns out this really isn't necessary; this answer explains in detail why we don't need to use any interlocked operations for a simple read/write (but we do for a read-modify-write).

    0 讨论(0)
  • 2021-01-18 10:20

    It really depends on your compiler and the CPU you are running on.

    x86 CPUs will atomically read 32-bit values without the LOCK prefix if the memory address is properly aligned. However, you most likely will need some sort of memory barrier to control the CPUs out-of-order execution if the variable is used as a lock/count of some other related data. Data that is not aligned might not be read atomically, especially if the value straddles a page boundary.

    If you are not hand coding assembly you also need to worry about the compilers reordering optimizations.

    Any variable marked as volatile will have ordering constraints in the compiler (and possibly the generated machine code) when compiling with Visual C++:

    The _ReadBarrier, _WriteBarrier, and _ReadWriteBarrier compiler intrinsics prevent compiler re-ordering only. With Visual Studio 2003, volatile to volatile references are ordered; the compiler will not re-order volatile variable access. With Visual Studio 2005, the compiler also uses acquire semantics for read operations on volatile variables and release semantics for write operations on volatile variables (when supported by the CPU).

    Microsoft specific volatile keyword enhancements:

    When the /volatile:ms compiler option is used—by default when architectures other than ARM are targeted—the compiler generates extra code to maintain ordering among references to volatile objects in addition to maintaining ordering to references to other global objects. In particular:

    • A write to a volatile object (also known as volatile write) has Release semantics; that is, a reference to a global or static object that occurs before a write to a volatile object in the instruction sequence will occur before that volatile write in the compiled binary.

    • A read of a volatile object (also known as volatile read) has Acquire semantics; that is, a reference to a global or static object that occurs after a read of volatile memory in the instruction sequence will occur after that volatile read in the compiled binary.

    This enables volatile objects to be used for memory locks and releases in multithreaded applications.


    For architectures other than ARM, if no /volatile compiler option is specified, the compiler performs as if /volatile:ms were specified; therefore, for architectures other than ARM we strongly recommend that you specify /volatile:iso, and use explicit synchronization primitives and compiler intrinsics when you are dealing with memory that is shared across threads.

    Microsoft provides compiler intrinsics for most of the Interlocked* functions and they will compile to something like LOCK XADD ... instead of a function call.

    Until "recently", C/C++ had no support for atomic operations or threads in general but this changed in C11/C++11 where atomic support was added. Using the <atomic> header and its types/functions/classes moves the alignment and reordering responsibility to the compiler so you don't have to worry about that. You still have to make a choice regarding memory barriers and this determines the machine code generated by the compiler. With relaxed memory order, the load atomic operation will most likely end up as a simple MOV instruction on x86. A stricter memory order can add a fence and possibly the LOCK prefix if the compiler determines that the target platform requires it.

    0 讨论(0)
提交回复
热议问题