For purposes of ordering, is atomic read-modify-write one operation or two?

问题

Consider an atomic read-modify-write operation such as x.exchange(..., std::memory_order_acq_rel). For purposes of ordering with respect to loads and stores to other objects, is this treated as:

a single operation with acquire-release semantics?
Or, as an acquire load followed by a release store, with the added guarantee that other loads and stores to x will observe both of them or neither?

If it's #2, then although no other operations in the same thread could be reordered before the load or after the store, it leaves open the possibility that they could be reordered in between the two.

As a concrete example, consider:

std::atomic<int> x, y;

void thread_A() {
    x.exchange(1, std::memory_order_acq_rel);
    y.store(1, std::memory_order_relaxed);
}

void thread_B() {
    // These two loads cannot be reordered
    int yy = y.load(std::memory_order_acquire);
    int xx = x.load(std::memory_order_acquire);
    std::cout << xx << ", " << yy << std::endl;
}

Is it possible for thread_B to output 0, 1?

If the x.exchange() were replaced by x.store(1, std::memory_order_release); then thread_B could certainly output 0, 1. Should the extra implicit load in exchange() rule that out?

cppreference makes it sound like #1 is the case and 0, 1 is forbidden:

A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store.

But I can't find anything explicit in the standard to support this. Actually the standard says very little about atomic read-modify-write operations at all, except 31.4 (10) in N4860 which is just the obvious property that the read has to read the last value written before the write. So although I hate to question cppreference, I'm wondering if this is actually correct.

I'm also looking at how it's implemented on ARM64. Both gcc and clang compile thread_A as essentially

ldaxr [x]
stlxr #1, [x]
str #1, [y]

(See on godbolt.) Based on my understanding of ARM64 semantics, and some tests (with a load of y instead of a store), I think that the str [y] can become visible before the stlxr [x] (though of course not before the ldaxr). This would make it possible for thread_B to observe 0, 1. So if #1 is true then it would seem that gcc and clang are both wrong, which I hesitate to believe.

Finally, as far as I can tell, replacing memory_order_acq_rel with seq_cst wouldn't change anything about this analysis, since it only adds semantics with respect to other seq_cst operations, and we don't have any here.

I found What exact rules in the C++ memory model prevent reordering before acquire operations? which, if I understand it correctly, seems to agree that #2 is correct, and that 0, 1 could be observed. I'd still appreciate confirmation, as well as a check on whether the cppreference quote is actually wrong or if I'm misunderstanding it.

来源：https://stackoverflow.com/questions/65568185/for-purposes-of-ordering-is-atomic-read-modify-write-one-operation-or-two

标签

c++

atomic

memory-barriers

stdatomic