“pseudo-atomic” operations in C++

前端 未结 7 1669
北海茫月
北海茫月 2021-02-02 17:44

So I\'m aware that nothing is atomic in C++. But I\'m trying to figure out if there are any \"pseudo-atomic\" assumptions I can make. The reason is that I want to avoid using

相关标签:
7条回答
  • 2021-02-02 18:23

    There are no threads in standard C++, and Threads cannot be implemented as a library.

    Therefore, the standard has nothing to say about the behaviour of programs which use threads. You must look to whatever additional guarantees are provided by your threading implementation.

    That said, in threading implementations I've used:

    (1) yes, you can assume that irrelevant values aren't written to variables. Otherwise the whole memory model goes out the window. But be careful that when you say "another thread" never sets b to false, that means anywhere, ever. If it does, that write could perhaps be re-ordered to occur during your loop.

    (2) no, the compiler can re-order the assignments to b1 and b2, so it is possible for b1 to end up true and b2 false. In such a simple case I don't know why it would re-order, but in more complex cases there might be very good reasons.

    [Edit: oops, by the time I got to answering (2) I'd forgotten that b was volatile. Reads from a volatile variable won't be re-ordered, sorry, so yes on a typical threading implementation (if there is any such thing), you can assume that you won't end up with b1 true and b2 false.]

    (3) same as 1. volatile in general has nothing to do with threading at all. However, it is quite exciting in some implementations (Windows), and might in effect imply memory barriers.

    (4) on an architecture where int writes are atomic yes, although volatile has nothing to do with it. See also...

    (5) check the docs carefully. Likely yes, and again volatile is irrelevant, because on almost all architectures int writes are atomic. But if int write is not atomic, then no (and no for the previous question), even if it's volatile you could in principle get a different value. Given those values 7 and 8, though, we're talking a pretty weird architecture for the byte containing the relevant bits to be written in two stages, but with different values you could more plausibly get a partial write.

    For a more plausible example, suppose that for some bizarre reason you have a 16 bit int on a platform where only 8bit writes are atomic. Odd, but legal, and since int must be at least 16 bits you can see how it could come about. Suppose further that your initial value is 255. Then increment could legally be implemented as:

    • read the old value
    • increment in a register
    • write the most significant byte of the result
    • write the least significant byte of the result.

    A read-only thread which interrupted the incrementing thread between the third and fourth steps of that, could see the value 511. If the writes are in the other order, it could see 0.

    An inconsistent value could be left behind permanently if one thread is writing 255, another thread is concurrently writing 256, and the writes get interleaved. Impossible on many architectures, but to know that this won't happen you need to know at least something about the architecture. Nothing in the C++ standard forbids it, because the C++ standard talks about execution being interrupted by a signal, but otherwise has no concept of execution being interrupted by another part of the program, and no concept of concurrent execution. That's why threads aren't just another library - adding threads fundamentally changes the C++ execution model. It requires the implementation to do things differently, as you'll eventually discover if for example you use threads under gcc and forget to specify -pthreads.

    The same could happen on a platform where aligned int writes are atomic, but unaligned int writes are permitted and not atomic. For example IIRC on x86, unaligned int writes are not guaranteed atomic if they cross a cache line boundary. x86 compilers will not mis-align a declared int variable, for this reason and others. But if you play games with structure packing you could probably provoke an example.

    So: pretty much any implementation will give you the guarantees you need, but might do so in quite a complicated way.

    In general, I've found that it is not worth trying to rely on platform-specific guarantees about memory access, that I don't fully understand, in order to avoid mutexes. Use a mutex, and if that's too slow use a high-quality lock-free structure (or implement a design for one) written by someone who really knows the architecture and compiler. It will probably be correct, and subject to correctness will probably outperform anything I invent myself.

    0 讨论(0)
  • 2021-02-02 18:28

    If your C++ implementation supplies the library of atomic operations specified by n2145 or some variant thereof, you can presumably rely on it. Otherwise, you cannot in general rely on "anything" about atomicity at the language level, since multitasking of any kind (and therefore atomicity, which deals with multitasking) is not specified by the existing C++ standard.

    0 讨论(0)
  • 2021-02-02 18:37

    It's generally a really, really bad idea to depend on this, as you could end up with bad things happening and only one some architectures. The best solution would be to use a guaranteed atomic API, for example the Windows Interlocked api.

    0 讨论(0)
  • 2021-02-02 18:40

    My answer is going to be frustrating: No, No, No, No, and No.

    1-4) The compiler is allowed to do ANYTHING it pleases with a variable it writes to. It may store temporary values in it, so long as ends up doing something that would do the same thing as that thread executing in a vacuum. ANYTHING is valid

    5) Nope, no guarantee. If a variable is not atomic, and you write to it on one thread, and read or write to it on another, it is a race case. The spec declares such race cases to be undefined behavior, and absolutely anything goes. That being said, you will be hard pressed to find a compiler that does not give you 7 or 8, but it IS legal for a compiler to give you something else.

    I always refer to this highly comical explanation of race cases.

    http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong

    0 讨论(0)
  • 2021-02-02 18:41

    May be this thread is ancient, but the C++ 11 standard DOES have a thread library and also a vast atomic library for atomic operations. The purpose is specifically for concurrency support and avoid data races. The relevant header is atomic

    0 讨论(0)
  • 2021-02-02 18:42

    Most of the answers correctly address the CPU memory ordering issues you're going to experience, but none have detailed how the compiler can thwart your intentions by re-ordering your code in ways that break your assumptions.

    Consider an example taken from this post:

    volatile int ready;       
    int message[100];      
    
    void foo(int i) 
    {      
        message[i/10] = 42;      
        ready = 1;      
    }
    

    At -O2 and above, recent versions of GCC and Intel C/C++ (don't know about VC++) will do the store to ready first, so it can be overlapped with computation of i/10 (volatile does not save you!):

        leaq    _message(%rip), %rax
        movl    $1, _ready(%rip)      ; <-- whoa Nelly!
        movq    %rsp, %rbp
        sarl    $2, %edx
        subl    %edi, %edx
        movslq  %edx,%rdx
        movl    $42, (%rax,%rdx,4)
    

    This isn't a bug, it's the optimizer exploiting CPU pipelining. If another thread is waiting on ready before accessing the contents of message then you have a nasty and obscure race.

    Employ compiler barriers to ensure your intent is honored. An example that also exploits the relatively strong ordering of x86 are the release/consume wrappers found in Dmitriy Vyukov's Single-Producer Single-Consumer queue posted here:

    // load with 'consume' (data-dependent) memory ordering 
    // NOTE: x86 specific, other platforms may need additional memory barriers
    template<typename T> 
    T load_consume(T const* addr) 
    {  
      T v = *const_cast<T const volatile*>(addr); 
      __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
      return v; 
    } 
    
    // store with 'release' memory ordering 
    // NOTE: x86 specific, other platforms may need additional memory barriers
    template<typename T> 
    void store_release(T* addr, T v) 
    { 
      __asm__ __volatile__ ("" ::: "memory"); // compiler barrier 
      *const_cast<T volatile*>(addr) = v; 
    } 
    

    I suggest that if you are going to venture into the realm of concurrent memory access, use a library that will take care of these details for you. While we all wait for n2145 and std::atomic check out Thread Building Blocks' tbb::atomic or the upcoming boost::atomic.

    Besides correctness, these libraries can simplify your code and clarify your intent:

    // thread 1
    std::atomic<int> foo;  // or tbb::atomic, boost::atomic, etc
    foo.store(1, std::memory_order_release);
    
    // thread 2
    int tmp = foo.load(std::memory_order_acquire);
    

    Using explicit memory ordering, foo's inter-thread relationship is clear.

    0 讨论(0)
提交回复
热议问题