Suppose A
, B
, a
, and b
are all variables, and the addresses of A
, B
, a
, and
Since A = a; and B = b; are independent in terms of data dependencies, this should not matter. If there was an output/outcome of previous instruction affecting the subsequent instruction's input, then ordering matters, otherwise not. this is strictly sequential execution normally.
If there is no dependency of instructions, these may be executed out of order also if final outcome is not affected. You can observe this while debugging a code compiled at higher optimization level.
It may be of interest that if you do this:
{ A=a, B=b; /*etc*/ }
Note the comma in place of the semi-colon.
Then the C++ specification and any confirming compiler will have to guarantee the execution order because operands of the comma operator are always evaluated left to right. This can indeed be used to prevent the optimizer from subverting your thread synchronization by reordering. The comma effectively becomes a barrier across which reordering is not allowed.
The compiler is only obligated to emulate the observable behavior of a program, so if a re-ordering would not violate that principle then it would be allowed. Assuming the behavior is well defined, if your program contains undefined behavior such as a data race then the behavior of the program will be unpredictable and as commented would require use of some form of synchronization to protect the critical section.
A Useful reference
An interesting article that covers this is Memory Ordering at Compile Time and it says:
The cardinal rule of memory reordering, which is universally followed by compiler developers and CPU vendors, could be phrased as follows:
Thou shalt not modify the behavior of a single-threaded program.
An Example
The article provides a simple program where we can see this reordering:
int A, B; // Note: static storage duration so initialized to zero
void foo()
{
A = B + 1;
B = 0;
}
and shows at higher optimization levels B = 0
is done before A = B + 1
, and we can reproduce this result using godbolt, which while using -O3
produces the following (see it live):
movl $0, B(%rip) #, B
addl $1, %eax #, D.1624
Why?
Why does the compiler reorder? The article explains it is exactly the same reason the processor does so, because of complexity of the architecture:
As I mentioned at the start, the compiler modifies the order of memory interactions for the same reason that the processor does it – performance optimization. Such optimizations are a direct consequence of modern CPU complexity.
Standards
In the draft C++ standard this is covered in section 1.9
Program execution which says (emphasis mine going forward):
The semantic descriptions in this International Standard define a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.5
footnote 5
tells us this is also known as the as-if rule:
This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side effects affecting the observable behavior of the program are produced.
the draft C99 and draft C11 standard covers this in section 5.1.2.3
Program execution although we have to go to the index to see that it is called the as-if rule in the C standard as well:
as−if rule, 5.1.2.3
Update on Lock-Free considerations
The article An Introduction to Lock-Free Programming covers this topic well and for the OPs concerns on lock-less shared-hash table implementation this section is probably the most relevant:
Memory Ordering
As the flowchart suggests, any time you do lock-free programming for multicore (or any symmetric multiprocessor), and your environment does not guarantee sequential consistency, you must consider how to prevent memory reordering.
On today’s architectures, the tools to enforce correct memory ordering generally fall into three categories, which prevent both compiler reordering and processor reordering:
- A lightweight sync or fence instruction, which I’ll talk about in future posts;
- A full memory fence instruction, which I’ve demonstrated previously;
- Memory operations which provide acquire or release semantics.
Acquire semantics prevent memory reordering of operations which follow it in program order, and release semantics prevent memory reordering of operations preceding it. These semantics are particularly suitable in cases when there’s a producer/consumer relationship, where one thread publishes some information and the other reads it. I’ll also talk about this more in a future post.
Both standards allow for these instructions to be performed out of order, so long as that does not change observable behaviour. This is known as the as-if rule:
Note that as is pointed out in the comments, what is meant by "observable behaviour" is the observable behaviour of a program with defined behaviour. If your program has undefined behaviour, then the compiler is excused from reasoning about that.
My read is that this is required to work by the C++ standard; however if you're trying to use this for multithreading control, it doesn't work in that context because there is nothing here to guarantee the registers get written to memory in the right order.
As your edit indicates, you are trying to use it exactly where it will not work.