Can non-atomic-load be reordered after atomic-acquire-load?

后端 未结 4 1032
灰色年华
灰色年华 2021-01-02 22:28

As known in since C++11 there are 6 memory orders, and in documentation written about std::memory_order_acquire:

  • http://en.cppreference.com/w/cpp/
相关标签:
4条回答
  • 2021-01-02 23:13

    Just to answer your headline question: yes, any loads (whether atomic or non-atomic) can be re-ordered after an atomic load. Similarly any stores can be re-ordered before an atomic store.

    However, an atomic store is not necessarily allowed to be re-ordered after an atomic load or vice verse (atomic load re-ordered before atomic store).

    See Herb Sutter's talk around 44:00.

    0 讨论(0)
  • 2021-01-02 23:18

    The reference you cited is pretty clear: you can't move reads before this load. In your example:

    static std::atomic<int> X;
    static int L;
    
    
    void thread_func() 
    {
        int local1 = L;  // (1)
        int x_local = X.load(std::memory_order_acquire);  // (2)
        int local2 = L;  // (3)
    }
    

    memory_order_acquire means that (3) cannot happen before (2) (the load in (2) is sequenced before thr load in (3)). It says nothing about the relationship between (1) and (2).

    0 讨论(0)
  • 2021-01-02 23:30

    A load operation with this memory order performs the acquire operation on the affected memory location: no memory accesses in the current thread can be reordered before this load.

    That's like a rule of thumb of compiler code generation.

    But that's absolutely not an axiom of C++.

    There are many cases, some trivially detectable, some requiring more work, where an operation on memory Op on V can be provably reordered with an atomic operation X on A.

    The two most obvious cases:

    • when V is a strictly local variable: one that can't be accessed by any other thread (or signal handler) because its address is not made available outside of the function;
    • when A is such a strictly local variable.

    (Note that these two reordering by the compiler are valid for any of the possible memory ordering specified for X.)

    In any case, the transformation is not visible, it doesn't change the possible executions of valid programs.

    There are less obvious cases where these types of code transformations are valid. Some are contrived, some are realistic.

    I can easily come up with this contrived example:

    using namespace std;
    
    static atomic<int> A;
    
    int do_acq() {
      return A.load(memory_order_acquire);
    }
    
    void do_rel() {
      A.store(0, memory_order_release);
    } // that's all folks for that TU
    

    Note:

    the use of static variable to be able to see all operations on the object, on separately compiled code; the functions which access the atomic synchronization object are not static and can be called from all the program.

    As a synchronization primitive, operations on A establish synchronize-with relations: there is one between:

    • thread X that calls do_rel() at point pX
    • and thread Y that calls do_acq() at point pY

    There is a well defined order of modification M of A corresponding to the calls to do_rel() in different threads. Each call to do_acq() either:

    • observes the result of a call to do_rel() at pX_i and synchronizes with thread X by pulling in the history of X at pX_i
    • observes the initial value of A

    On the other hand, the value is always 0, so the calling code only gets a 0 from do_acq() and cannot determine what happened from the return value. It can know a priori that a modification of A has already happened, but it can't know only a posteriori. The a priori knowledge can come from another synchronization operation. A priori knowledge is part of the history of thread Y. Either way, the acquire operation does not had knowledge and does not add a past history: the known part of the acquire operation is empty, it doesn't reliably acquire anything that was in the past of thread Y at pY_i. So the acquire on A is meaningless and can be optimized out.

    In other words: A program valid for all possible values of M must be valid when do_acq() sees the most recent do_rel() in history of Y, the one that is before all modifications of A that can be seen. So do_rel() adds nothing in general: do_rel() can add a non redundant synchronize-with in some executions, but the minimum of what it adds Y is nothing, so a correct program, one that doesn't have a race condition (expressed as: its behavior depends on M, such as its correctness is a function of getting some subset of the allowable values for M) must be prepared to handle getting nothing from do_rel(); so the compiler can make do_rel() a NOP.

    [Note: That the line of argument doesn't easily generalizes to all RMW operations that read a 0 and store a 0. It probably can't work for acq-rel RMW. In other words, acq+rel RMW are more powerful than separate loads and stores, for their “side effect”.]

    Summary: in that particular example, not only the memory operations can move up and down with respect to an atomic acquire operation, the atomic operations can be removed completely.

    0 讨论(0)
  • 2021-01-02 23:31

    I believe this is the correct way to reason about your example within the C++ standard:

    1. X.load(std::memory_order_acquire) (let's call it "operation (A)") may synchronize with a certain release operation on X (operation (R)) - roughly, the operation that assigned the value to X that (A) is reading.

    [atomics.order]/2 An atomic operation A that performs a release operation on an atomic object M synchronizes with an atomic operation B that performs an acquire operation on M and takes its value from any side effect in the release sequence headed by A.

    1. This synchronizes-with relationship may help establish a happens-before relationship between some modification of L and the assignment local2 = L. If that modification of L happens-before (R), then, due to the fact that (R) synchronizes-with (A) and (A) is sequenced-before the read of L, that modification of L happens-before this read of L.

    2. But (A) has no effect whatsoever on the assignment local1 = L. It neither causes data races involving this assignment, nor helps prevent them. If the program is race-free, then it must necessarily employ some other mechanism to ensure that modifications of L are synchronized with this read (and if it's not race-free, then it exhibits undefined behavior and the standard has nothing further to say about it).


    It is meaningless to talk about "instruction reordering" within the four corners of the C++ standard. One may talk about machine instructions generated by a particular compiler, or the way those instructions are executed by a particular CPU. But from the standard's standpoint, these are merely irrelevant implementation details, as long as that compiler and that CPU produce observable behavior consistent with one possible execution path of an abstract machine described by the standard (the As-If rule).

    0 讨论(0)
提交回复
热议问题