C++11: the difference between memory_order_relaxed and memory_order_consume

后端 未结 2 2065
无人共我
无人共我 2020-12-14 10:52

I am now learning C++11 memory order model and would like to understand the difference between memory_order_relaxed and memory_order_consume.

2条回答
  •  时光说笑
    2020-12-14 11:26

    Question 1

    No.
    memory_order_relaxed imposes no memory order at all:

    Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation.

    While memory_order_consume imposes memory ordering on data dependent reads (on the current thread)

    A load operation with this memory order performs a consume operation on the affected memory location: no reads in the current thread dependent on the value currently loaded can be reordered before this load.

    Edit

    In general memory_order_seq_cst is stronger memory_order_acq_rel is stronger memory_ordering_relaxed.
    This is like having a Elevator A that can lift 800 Kg Elevator C that lifts 100Kg.
    Now if you had the power to magically change Elevator A into Elevator C, what would happen if the former was filled with 10 average-weighting people? That would be bad.

    To see what could go wrong with the code exactly, consider the example on your question:

    Thread A                                   Thread B
    Payload = 42;                              g = Guard.load(memory_order_consume);
    Guard.store(1, memory_order_release);      if (g != 0)
                                                   p = Payload;
    

    This snippet are intended to be looped, there is no synchronization, only ordering, between the two threads.

    With memory_order_relaxed, and assuming that a natural word load/store is atomic, the code would be equivalent to

    Thread A                                   Thread B
    Payload = 42;                              g = Guard
    Guard = 1                                  if (g != 0)
                                                   p = Payload;
    

    From a CPU point of view on Thread A there are two stores to two separate addresses, so if Guard is "closer" to the CPU (meaning the store will complete faster) from another processor it seems that Thread A is perfoming

    Thread A
    Guard = 1
    Payload = 42
    

    And this order of execution is possible

    Thread A   Guard = 1
    Thread B   g = Guard
    Thread B   if (g != nullptr) p = Payload
    Thread A   Payload = 42
    

    And that's bad, since Thread B read a non updated value of Payload.

    It could seems however that in Thread B the synchronization would be useless since the CPU won't do a reorder like

    Thread B
    if (g != 0) p = Payload;
    g = Guard
    

    But it actually will.

    From its perspective there are two unrelated load, it is true that one is on a dependent data path but the CPU can still speculatively do the load:

    Thread B
    hidden_tmp = Payload;
    g = Guard
    if (g != 0) p = hidden_tmp
    

    That may generate the sequence

    Thread B   hidden_tmp = Payload;
    Thread A   Payload = 42;
    Thread A   Guard = 1;
    Thread B   g = Guard
    Thread B   if (g != 0) p = hidden_tmp
    

    Whoops.

    Question 2

    In general that can never be done.
    You can replace memory_order_acquire with memory_order_consume when you are going to generate an address dependency between the loaded value and the value(s) whose access need to be ordered.


    To understand memory_order_relaxed we can take the ARM architecture as a reference.
    The ARM Architecture mandates only a weak memory ordering meaning that in general the loads and stores of a program can be executed in any order.

    str r0, [r2]
    str r0, [r3]
    

    In the snippet above the store to [r3] can be observed, externally, before the store to [r2]1.

    However the CPU doesn't go as far as the Alpha CPU and imposes two kinds of dependencies: address dependency, when a value load from memory is used to compute the address of another load/store, and control dependency, when a value load from memory is used to compute the control flags of another load/store.

    In the presence of such dependency the ordering of two memory operations is guaranteed to be visible in program order:

    If there is an address dependency then the two memory accesses are observed in program order.

    So, while a memory_order_acquire would generate a memory barrier, with memory_order_consume you are telling the compiler that the way you'll use the loaded value will generate an address dependency and so it can, if relevant to the architecture, exploit this fact and omit a memory barrier.


    1 If r2 is the address of a synchronization object, that's bad.

提交回复
热议问题