In the libuv source, I found this code:
/* The if statement lets the compiler compile it to a conditional store.
* Avoids dirtying a cache line.
*/
if
"to cache" means to hide something. The function of a Cache in computing is to hide the distance to main memory, by preempting main memory accesses as much as possible.
That only works if you used the data before, you have not pushed it out of cache yet, and nobody else took it away before you. Any other actor (other CPU, IO-Bus, ...) must be able to get the current value and change it, even if you have it cached. This task is accomplished using cache coherency protocols. Higher coherency means higher cost.
What your code tries to do, is make the compiler emit a conditional move, so the CPU checks for 0, and only writes if not 0. There's a whole family of conditional move instructions in the Intel/AMD IS and many others.
So now, step for step:
So, is it worth it? It depends.
Aside: Why provide conditional store instruction, if you can synthesise them using a conditional jump and a store? The advantages are using less instructions and no risk of flushing the instruction pipeline (partly executed following instructions). UPDATE: Looks like they cannot move from register/immediate to memory on x86/x86_64.
A cache is organized in blocks of fast memory that, for historical reasons, are called lines. When you write to a cache line, it is marked as "dirty," meaning a bit is set in the cache controller hardware signifying that the line needs to be copied to other levels of cache and/or main memory before some other part of the system can access it.
In general, each level of the memory hierarchy: registers, L1, L2, L3 ... cache, main memory, and swap space has different copies of the same information. Making sure that different parts of the system (processors, DMA, video subsystem, etc.) see the same value even though one or more copies may have changed is called the coherency problem.
The general solution is pausing to copy an updated value to different levels of the hierarchy. This is called a flush.
A flush can cost 10's to - in the worst case when it causes a page fault - perhaps millions of processor cycles.
Due to this high cost, hardware designers go to great lengths to minimize the need for flushes. Here the programmer has taken up the cause, too.
The comment is saying "if the cache already contains a zero in the flag, let's not write a zero over the zero because this will mark the cache line dirty, which may cause an unnecessary flush."
"Conditional store" is a slightly obscure term. It just refers to a jump on zero around the normal store, which is the code a compiler will produce from the if
statement. In X86 it will look something like:
;; assume edi holds value of pointer 'loop'
;; and flag is a constant offset for the 'stop_flag' field.
cmp dword ptr [edi, flag], 0
jz no_store
mov [edi, flag], 0
no_store:
... code continues
With the if statement missing, you'd have only the last mov
instruction.
NB A commenter pointed out that there do exist single "conditional move/store" instructions on important processor architectures. I have not seen gcc
produce one.
Whether this is a worthwhile optimization is very debatable. Conditionals have their own risks of flushing the instruction pipeline (a different kind of flush). Never sacrifice clarity for speed without clear evidence that it's needed.