GCC reordering up across load with `memory_order_seq_cst`. Is this allowed?

≯℡__Kan透↙ 提交于 2019-12-04 02:24:24

Congratulations, I think you've hit a bug in gcc!

Now I think you can make a reasonable argument, as the other answer does, that the original code you showed could perhaps have been correctly optimized that way by gcc by relying on a fairly obscure argument about the unconditional access to value: essentially you can't have been relying on a synchronizes-with relationship between the load seq0 = seq_.load(); and the subsequent read of value, so reading it "somewhere else" shouldn't change the semantics of a race-free program. I'm not actually sure of this argument, but here's a "simpler" case I got from reducing your code:

#include <atomic>
#include <iostream>

std::atomic<std::size_t> seq_;
std::size_t value;

auto load()
{
    std::size_t copy;
    std::size_t seq0;
    do
    {
        seq0 = seq_.load();
        if (!seq0) continue;
        copy = value;
        seq0 = seq_.load();
    } while (!seq0);

    return copy;
}

This isn't a seqlock or anything - it just waits for seq0 to change from zero to non-zero, and then reads value. The second read of seq_ is superfluous as is the while condition, but without them the bug goes away.

This is now the read-side of the well known idiom which does work and is race-free: one thread writes to value, then sets seq0 non-zero with a release store. The threads calling load see the non-zero store, and synchronize with it, and so can safely read value. Of course, you can't keep writing to value, it's a "one time" initialization, but this a common pattern.

With the above code, gcc is still hoisting the read of value:

load():
        mov     rax, QWORD PTR value[rip]
.L2:
        mov     rdx, QWORD PTR seq_[rip]
        test    rdx, rdx
        je      .L2
        mov     rdx, QWORD PTR seq_[rip]
        test    rdx, rdx
        je      .L2
        rep ret

Oops!

This behavior occurs up to gcc 7.3, but not in 8.1. Your code also compiles as you wanted in 8.1:

    mov     rbx, QWORD PTR seq_[rip]
    mov     rbp, QWORD PTR value[rip]
    mov     rax, QWORD PTR seq_[rip]

Reordering such operations is not allowed in general, but it is allowed in this case because you invoked undefined behavior by creating a race condition in the read by interleaving a non-atomic read and write in different threads.

The C++11 standard says:

Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.

And also that:

The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.

This applies even to things that occur before the undefined behavior:

A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).

Because reading from a non-atomic write there creates undefined behavior (even if you overwrite and ignore the value), GCC is allowed to assume it does not occur and thus optimize out the seqlock.

Based on another answer, it seems this is actually caused by a bug in GCC which persists when you fix the UB, but that optimization wasn't technically invalid for your code since you invoked UB.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!