Using a simplified version of a basic seqlock , gcc reorders a nonatomic load up across an atomic load(memory_order_seq_cst)
when compiling the code with -O3
Note:
Based on another answer, it seems this is actually caused by a bug in GCC which persists when you fix the UB, but that optimization wasn't technically invalid for your code since you invoked UB, as explained below.
Reordering such operations is not allowed in general, but it is allowed in this case because any concurrently executing code which would yield a different result must invoke undefined behavior by creating a race condition in the read by interleaving a non-atomic read and (atomic or non-atomic) write in different threads.
The C++11 standard says:
Two expression evaluations conflict if one of them modifies a memory location (1.7) and the other one accesses or modifies the same memory location.
And also that:
The execution of a program contains a data race if it contains two conflicting actions in different threads, at least one of which is not atomic, and neither happens before the other. Any such data race results in undefined behavior.
This applies even to things that occur before the undefined behavior:
A conforming implementation executing a well-formed program shall produce the same observable behavior as one of the possible executions of the corresponding instance of the abstract machine with the same program and the same input. However, if any such execution contains an undefined operation, this International Standard places no requirement on the implementation executing that program with that input (not even with regard to operations preceding the first undefined operation).
Because non-atomic reading from the write there creates undefined behavior (even if you overwrite and ignore the value), GCC is allowed to assume it does not occur and thus optimize out the seqlock. It can do so because any initial (acquired) state which would cause the loop to execute multiple times does not guard against subsequent race conditions from the non-atomic read as any subsequent atomic or non-atomic write to the variable beyond the initially acquired state does not establish a guaranteed synchronize-with relationship with the load operation before the non-atomic read. That is to say, the write could occur to the non-atomic read variable inbetween the execution of the seq cst load and the subsequent read, which is a race condition. The fact this "could" occur is a pointer to the lack of synchronizes with relationship and hence undefined behavior, so the compiler may assume it doesn't happen, which allows it to assume that no concurrent write whatsoever will happen to that variable during the loop.