Is std::atomic_compare_exchange_weak thread-unsafe by design?

后端 未结 5 1738
北海茫月
北海茫月 2020-12-28 17:47

It was brought up on cppreference atomic_compare_exchange Talk page that the existing implementations of std::atomic_compare_exchange_weak compute the boolea

相关标签:
5条回答
  • 2020-12-28 18:35

    TL;DR: atomic_compare_exchange_weak is safe by design, but actual implementations are buggy.

    Here's the code that Clang actually generates for this little snippet:

    struct node {
      int data;
      node* next;
    };
    
    std::atomic<node*> head;
    
    void push(int data) {
      node* new_node = new node{data};
      new_node->next = head.load(std::memory_order_relaxed);
      while (!head.compare_exchange_weak(new_node->next, new_node,
          std::memory_order_release, std::memory_order_relaxed)) {}
    }
    

    Result:

      movl  %edi, %ebx
      # Allocate memory
      movl  $16, %edi
      callq _Znwm
      movq  %rax, %rcx
      # Initialize with data and 0
      movl  %ebx, (%rcx)
      movq  $0, 8(%rcx) ; dead store, should have been optimized away
      # Overwrite next with head.load
      movq  head(%rip), %rdx
      movq  %rdx, 8(%rcx)
      .align  16, 0x90
    .LBB0_1:                                # %while.cond
                                            # =>This Inner Loop Header: Depth=1
      # put value of head into comparand/result position
      movq  %rdx, %rax
      # atomic operation here, compares second argument to %rax, stores first argument
      # in second if same, and second in %rax otherwise
      lock
      cmpxchgq  %rcx, head(%rip)
      # unconditionally write old value back to next - wait, what?
      movq  %rax, 8(%rcx)
      # check if cmpxchg modified the result position
      cmpq  %rdx, %rax
      movq  %rax, %rdx
      jne .LBB0_1
    

    The comparison is perfectly safe: it's just comparing registers. However, the whole operation is not safe.

    The critical point is this: the description of compare_exchange_(weak|strong) says:

    Atomically [...] if true, replace the contents of the memory point to by this with that in desired, and if false, updates the contents of the memory in expected with the contents of the memory pointed to by this

    Or in pseudo-code:

    if (*this == expected)
      *this = desired;
    else
      expected = *this;
    

    Note that expected is only written to if the comparison is false, and *this is only written to if comparison is true. The abstract model of C++ does not allow an execution where both are written to. This is important for the correctness of push above, because if the write to head happens, suddenly new_node points to a location that is visible to other threads, which means other threads can start reading next (by accessing head->next), and if the write to expected (which aliases new_node->next) also happens, that's a race.

    And Clang writes to new_node->next unconditionally. In the case where the comparison is true, that's an invented write.

    This is a bug in Clang. I don't know whether GCC does the same thing.

    In addition, the wording of the standard is suboptimal. It claims that the entire operation must happen atomically, but this is impossible, because expected is not an atomic object; writes to there cannot happen atomically. What the standard should say is that the comparison and the write to *this happen atomically, but the write to expected does not. But this isn't that bad, because no one really expects that write to be atomic anyway.

    So there should be a bug report for Clang (and possibly GCC), and a defect report for the standard.

    0 讨论(0)
  • 2020-12-28 18:41

    [...]

    break CAS loops such as Concurrency in Action's listing 7.2:

    while(!head.compare_exchange_weak(new_node->next, new_node);
    

    The specification (29.6.5[atomics.types.operations.req]/21-22) seems to imply that the result of the comparison must be a part of the atomic operation:

    [...]

    The issue with this code and the specification is not whether the atomicity of compare_exchange needs to extend beyond just the comparison and exchange itself to returning the result of the comparison or assigning to the expected parameter. That is, the code may still be correct without the store to expected being atomic.

    What causes the above code to be potentially racy is when implementations write to the expected parameter after a successful exchange may have been observed by other threads. The code is written with the expectation that in the case when the exchange is successful there is no write on expected to produce a race.

    The spec, as written, does appear to guarantee this expected behavior. (And indeed can be read as making the much stronger guarantee you describe, that the entire operation is atomic.) According to the spec, compare_exchange_weak:

    Atomically, compares the contents of the memory pointed to by object or by this for equality with that in expected, and if true, replaces the contents of the memory pointed to by object or by this with that in desired, and if false, updates the contents of the memory in expected with the contents of the memory pointed to by object or by this. [n4140 § 29.6.5 / 21] (N.B. The wording is unchanged between C++11 and C++14)

    The problem is that it seems as though the actual language of the standard is stronger than the original intent of the proposal. Herb Sutter is saying that Concurrency in Action's usage was never really intended to be supported, and that updating expected was only intended to be done on local variables.

    I don't see any current defect report on this. [See second update below] If in fact this language is stronger than intended then presumably one will get filed. Either C++11's wording will be updated to guarantee the above code's expected behavior, thus making current implementations non-conformant, or the new wording will not guarantee this behavior, making the above code potentially result in undefined behavior. In that case I guess Anthony's book will need updating. What the committee will do about this, and whether or not actual implementations conform to the original intent (rather than the actual wording of the spec) is still an open question. [See update below]

    For the purposes of writing code in the meantime, you'll have to take into account the actual behavior of implementation whether it's conformant or not. Existing implementations may be 'buggy' in the sense that they don't implement the the exact wording of the ISO spec, but they do operate as their implementers intended and they can be used to write thread safe code. [See update below]

    So to answer your questions directly:

    but is it actually implementable?

    I believe that the actual wording of the spec is not reasonably implementable (And that the actual wording makes guarantees stronger even than Anthony's just::thread library provides. For example the actual wording appears to require atomic operations on a non-atomic object. Anthony's slightly weaker interpretation, that the assignment to expected need not be atomic but must be conditioned on the failure of the exchange, is obviously implementable. Herb's even weaker interpretation is also obviously implementable, as that's what most libraries actually implement. [See update below]

    Is std::atomic_compare_exchange_weak thread-unsafe by design?

    The operation is not thread unsafe no matter whether the operation makes guarantees as strong as the actual wording of the spec or as weak as Herb Sutter indicates. It's simply that correct, thread safe usage of the operation depends on what is guaranteed. The example code from Concurrency in Action is an unsafe usage of a compare_exchange that only offers Herb's weak guarantee, but it could be written to work correctly with Herb's implementation. That could be done like so:

    node *expected_head = head.load();
    while(!head.compare_exchange_weak(expected_head, new_node) {
      new_node->next = expected_head;
    }
    

    With this change the 'spurious' writes to expected are simply made to a local variable, and no longer produce any races. The write to new_node->next is now conditional upon the exchange having failed, and thus new_node->next is not visible to any other thread and may be safely updated. This code sample is safe both under current implementations and under stronger guarantees, so it should be future proof to any updates to C++11's atomics that resolve this issue.


    Update:

    Actual implementations (MSVC, gcc, and clang at least) have been updated to offer the guarantees under Anthony Williams' interpretation; that is, they have stopped inventing writes to expected in the case that the exchange succeeds.

    https://llvm.org/bugs/show_bug.cgi?id=18899

    https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272

    https://connect.microsoft.com/VisualStudio/feedback/details/819819/std-atomic-compare-exchange-weak-has-spurious-write-which-can-cause-race-conditions

    Update 2:

    This defect report on this issue has been filed with the C++ committee. From the currently proposed resolution the committee does want to make stronger guarantees than provided by the implementations you checked (but not as strong as current wording which appears to guarantee atomic operations on non-atomic objects.) The draft for the next C++ standard (C++1z or 'C++17') has not yet adopted the improved wording.

    Update 3: C++17 adopted the proposed resolution.

    0 讨论(0)
  • 2020-12-28 18:41

    Quoting Duncan Forster from the linked page:

    The important thing to remember is that the hardware implementation of CAS only returns 1 value (the old value) not two (old plus boolean)

    So there's one instruction - the (atomic) CAS - which actually operates on memory, and then another instruction to convert the (atomically-assigned) result into the expected boolean.

    Since the value in %rax was set atomically and can't then be affected by another thread, there is no race here.

    The quote is anyway false, since ZF is also set depending on the CAS result (ie, it does return both the old value and the boolean). The fact the flag isn't used might be a missed optimisation, or the cmpq might be faster, but it doesn't affect correctness.


    For reference, consider decomposing compare_exchange_weak like this pseudocode:

    T compare_exchange_weak_value(atomic<T> *obj, T *expected, T desired) {
        // setup ...
        lock cmpxchgq   %rcx, (%rsp) // actual CAS
        return %rax; // actual destination value
    }
    
    bool compare_exchange_weak_bool(atomic<T> *obj, T *expected, T desired) {
        // CAS is atomic
        T actual = compare_exchange_weak_value(obj, expected, desired);
        // now we figure out if it worked
        return actual == *expected;
    }
    

    Do you agree the CAS is properly atomic?


    If the unconditional store to expected is really what you wanted to ask about (instead of the perfectly safe comparison), I agree with Sebastian that it's a bug.

    For reference, you can work around it by forcing the unconditional store into a local, and making the potentially-visible store conditional again:

    struct node {
      int data;
      node* next;
    };
    
    std::atomic<node*> head;
    
    void push(int data) {
      node* new_node = new node{data};
      node* cur_head = head.load(std::memory_order_relaxed);
      do {
        new_node->next = cur_head;
      } while (!head.compare_exchange_weak(cur_head, new_node,
                std::memory_order_release, std::memory_order_relaxed));
    }
    
    0 讨论(0)
  • 2020-12-28 18:42

    Those people don't seem to understand either the standard or the instructions.

    First of all, std::atomic_compare_exchange_weak is not thread-unsafe by design. That is complete nonsense. The design very clearly defines what the function does and which guarantees (including atomicity and memory ordering) it must provide.
    Whether your program that uses this function is thread-safe as a whole is a different matter, but the function's semantics per se are certainly correct in the sense of an atomic copare-exchange (you can still write thread-unsafe code using any available thread-safe primitive, but that is a totally different story).

    This particular function implements the "weak" version of a thread-safe compare-exchange operation which differs from the "non weak" version in that the implementation is allowed to generate code which may spuriously fail, if that gives a performance benefit (irrelevant on x86). Weak does not mean it's worse, it only means that it is allowable to fail more often on some platforms, if that gives an overall performance benefit.
    The implementation is of course still required to work correctly. That is, if the compare-exchange fails -- whether by concurrency or spuriously -- it must be correctly reported back as having failed.

    Second, the code generated by existing implementations has no bearing for the correctness or thread-safety of std::atomic_compare_exchange_weak. At best, if the generated instructions do not work correctly, this is an implementation issue, but it has nothing to do with the language construct. The language standard defines what behavior an implementation must provide, it is not responsible for implementations acutally doing it correctly.

    Third, there is no problem in the generated code. The x86 CMPXCHG instruction has a well-defined mode of operation. It compares the actual value with the expected value, and if the comparison is successful, it performs the swap. You know whether or not the operation was successful either by looking at EAX (or RAX in x64) or by the state of ZF.
    What matters is that the atomic compare-exchange is atomic, and that's the case. Whatever you do with the result afterwards needs not be atomic (in your case, the CMP), since the state does not change any more. Either the swap was successful at that point, or it has failed. In either case, it's already "history".

    std::atomic_compare_exchange_weak has different semantics than the underlying instruction, it returns a bool value. Therefore, you cannot always expect a 1:1 mapping to instructions. The compiler may have to generate additional instructions (and different ones depending on how you consume the result) to implement these semantics, but it really makes no difference for correctness.

    The only thing one could arguably complain about is the fact that instead of directly using the already present state of ZF (with a Jcc or CMOVcc), it performs another comparison. But this is a performance issue (1 cycle wasted), not a correctness issue.

    0 讨论(0)
  • 2020-12-28 18:47

    I was the one who originally found this bug. For the last few days I have been e-mailing Anthony Williams regarding this issue and vendor implementations. I didn't realize Cubbi had raise a StackOverFlow question. It's not just Clang or GCC it's every vendor that is broken (all that matters anyway). Anthony Williams also author of Just::Thread (a C++11 thread and atomic library) confirmed his library is implemented correctly (only known correct implementation).

    Anthony has raised a GCC bug report http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60272

    Simple example:

       #include <atomic>
       struct Node { Node* next; };
       void Push(std::atomic<Node*> head, Node* node)
       {
           node->next = head.load();
           while(!head.compare_exchange_weak(node->next, node))
               ;
       }
    

    g++ 4.8 [assembler]

           mov    rdx, rdi
           mov    rax, QWORD PTR [rdi]
           mov    QWORD PTR [rsi], rax
       .L3:
           mov    rax, QWORD PTR [rsi]
           lock cmpxchg    QWORD PTR [rdx], rsi
           mov    QWORD PTR [rsi], rax !!!!!!!!!!!!!!!!!!!!!!!
           jne    .L3
           rep; ret
    

    clang 3.3 [assembler]

           movq    (%rdi), %rcx
           movq    %rcx, (%rsi)
       .LBB0_1:
           movq    %rcx, %rax
           lock
           cmpxchgq    %rsi, (%rdi)
           movq    %rax, (%rsi) !!!!!!!!!!!!!!!!!!!!!!!
           cmpq    %rcx, %rax !!!!!!!!!!!!!!!!!!!!!!!
           movq    %rax, %rcx
           jne    .LBB0_1
           ret
    

    icc 13.0.1 [assembler]

           movl      %edx, %ecx
           movl      (%rsi), %r8d
           movl      %r8d, %eax
           lock
           cmpxchg   %ecx, (%rdi)
           movl      %eax, (%rsi) !!!!!!!!!!!!!!!!!!!!!!!
           cmpl      %eax, %r8d !!!!!!!!!!!!!!!!!!!!!!!
           je        ..B1.7
       ..B1.4:
           movl      %edx, %ecx
           movl      %eax, %r8d
           lock
           cmpxchg   %ecx, (%rdi)
           movl      %eax, (%rsi) !!!!!!!!!!!!!!!!!!!!!!!
           cmpl      %eax, %r8d !!!!!!!!!!!!!!!!!!!!!!!
           jne       ..B1.4
       ..B1.7:
           ret
    

    Visual Studio 2012 [No need to check assembler, MS uses _InterlockedCompareExchange !!!]

       inline int _Compare_exchange_seq_cst_4(volatile _Uint4_t *_Tgt, _Uint4_t *_Exp, _Uint4_t _Value)
       {    /* compare and exchange values atomically with
           sequentially consistent memory order */
           int _Res;
           _Uint4_t _Prev = _InterlockedCompareExchange((volatile long
    *)_Tgt, _Value, *_Exp);
           if (_Prev == *_Exp) !!!!!!!!!!!!!!!!!!!!!!!
               _Res = 1;
           else
           { /* copy old value */
               _Res = 0;
               *_Exp = _Prev;
           }
           return (_Res);
       }
    
    0 讨论(0)
提交回复
热议问题