Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst)
when compiling the code with -O3
Congratulations, I think you've hit a bug in gcc
!
Now I think you can make a reasonable argument, as the other answer does, that the original code you showed could perhaps have been correctly optimized that way by gcc
by relying on a fairly obscure argument about the unconditional access to value
: essentially you can't have been relying on a synchronizes-with relationship between the load seq0 = seq_.load();
and the subsequent read of value
, so reading it "somewhere else" shouldn't change the semantics of a race-free program. I'm not actually sure of this argument, but here's a "simpler" case I got from reducing your code:
#include
#include
std::atomic seq_;
std::size_t value;
auto load()
{
std::size_t copy;
std::size_t seq0;
do
{
seq0 = seq_.load();
if (!seq0) continue;
copy = value;
seq0 = seq_.load();
} while (!seq0);
return copy;
}
This isn't a seqlock
or anything - it just waits for seq0
to change from zero to non-zero, and then reads value
. The second read of seq_
is superfluous as is the while
condition, but without them the bug goes away.
This is now the read-side of the well known idiom which does work and is race-free: one thread writes to value
, then sets seq0
non-zero with a release store. The threads calling load
see the non-zero store, and synchronize with it, and so can safely read value
. Of course, you can't keep writing to value
, it's a "one time" initialization, but this a common pattern.
With the above code, gcc is still hoisting the read of value:
load():
mov rax, QWORD PTR value[rip]
.L2:
mov rdx, QWORD PTR seq_[rip]
test rdx, rdx
je .L2
mov rdx, QWORD PTR seq_[rip]
test rdx, rdx
je .L2
rep ret
Oops!
This behavior occurs up to gcc 7.3, but not in 8.1. Your code also compiles as you wanted in 8.1:
mov rbx, QWORD PTR seq_[rip]
mov rbp, QWORD PTR value[rip]
mov rax, QWORD PTR seq_[rip]