问题
In this article: http://www.drdobbs.com/parallel/volatile-vs-volatile/212701484?pgno=2
says, that we can't do any optimization for volatile
, even such as (where: volatile int& v = *(address);
):
v = 1; // C: write to v
local = v; // D: read from v
can't be optimized to this:
v = 1; // C: write to v
local = 1; // D: read from v // but it can be done for std::atomic<>
It is can't be done, because between 1st and 2nd lines may v
value be changed by hardware device (not CPU where can't work cache coherence: network adapter, GPU, FPGA, etc...) (sequentila/concurrency), which mapped to this memory location. But it is make sense only if v
can't be cached in CPU-cache L1/2/3, because for usual (non-volatile
) variable between 1st and 2nd line too small time and is likely to trigger cached.
Does volatile
qualifier guarantees no caching for this memory location?
ANSWER:
- No,
volatile
doesn't guarantee no caching for this memory location, and there aren't anything about this in C/C++ Standards or compiler manual. - Using memory mapped region, when memory mapped from device memory to CPU-memory is already marked as WC (write combining) instead of WB, that cancels the caching. And need not to do cache-flushing.
- An opposite, if CPU-memory mapped to the device memory, then incidentally, the controller PCIE, located on crystal of CPU, is snooping for data which going through DMA from this device, and updates(invalidate) CPU-cache L3. In this case, if the executable code on the device using the
volatile
tries to perform the same two lines, it also cancels the cache memory of the device (e.g. in the cache GPU-L2). And need not to do GPU-cache-flushing and need not to do CPU-cache-flushing. Also for CPU might need to usestd::atomic_thread_fence(std::memory_order_seq_cst);
if L3-cache(LLC) coherency with DMA over PCIE, but L1/L2 is not. And for nVidia CUDA we can use: void __threadfence_system(); - We need to flushing DMA-controllers-cache, when sending unaligned data: (WDK: KeFlushIoBuffers(), FlushAdapterBuffers())
- Also, we can mark any memory region as uncached as WC-marked by yourself via the MTRR registers.
回答1:
volatile
ensures that the variable won't be "cached" in CPU register. CPU cache is transparent to the programmer and if another CPU writes to the memory mapped by another CPU's cache, the second CPU's cache gets invalidated, therefore it will reload the value from the memory again during the next access.
Something about Cache coherence
As for the external memory writes (via DMA or another CPU-independent channel), you might need to flush the cache manually (see this SO question)
C Standard §6.7.3 7:
What constitutes an access to an object that has volatile-qualified type is implementation-defined.
回答2:
The semantics of volatile are implementation-defined. If a compiler knew that interrupts would be disabled while certain piece of code was executed, and knew that on the target platform there would be no means other than interrupt handlers via which operations on certain storage would be observable, it could register-cache volatile-qualified variables within such storage just the same as it could cache ordinary variables, provided it documented such behavior.
Note that what aspects of behavior are counted as "observable" may be defined in some measure by the implementation. If an implementation documents that it is not intended for use on hardware which uses main RAM accesses to trigger required externally-visible actions, then accesses to main RAM would not be "observable" on that implementation. The implementation would be compatible with hardware which was capable of physically observing such accesses, if nothing cared whether any such accesses were actually seen. If such accesses were required, however, as they would be if the accesses were regarded as "observable", however, the compiler would not be claiming compatibility and would thus make no promise about anything.
来源:https://stackoverflow.com/questions/18550784/does-volatile-qualifier-cancel-caching-for-this-memory