Intel's CLWB instruction invalidating cache lines

允我心安 提交于 2020-03-09 05:34:40

问题


I am trying to find configuration or memory access pattern for Intel's clwb instruction that would not invalidate cache line. I am testing on Intel Xeon Gold 5218 processor with NVDIMMs. Linux version is 5.4.0-3-amd64. I tried using Device−DAX mode and directly mapping this char device to the address space. I also tried adding this non-volatile memory as a new NUMA node and using numactl --membind command to bind memory to it. In both cases when I use clwb to cached address, it is evicted. I am observing eviction with PAPI hardware counters, with disabled prefetchers.

This is a simple loop that I am testing. array and tmp variable, both are declared as volatile, so the loads are really executed.

for(int i=0; i < arr_size; i++){
    tmp = array[i];
    _mm_clwb(& array[i]);
    _mm_mfence();
    tmp = array[i];    
}

Both reads are giving cache misses.

I was wondering if anyone else has tried to detect whether there is some configuration or memory access pattern that would leave the cache line in the cache?


回答1:


clwb behaves like clflushopt on SKX and CSL. However, programs that use clwb on these processors will automatically benefit when run on a future process that supports an optimized implementation of clwb.

Section 2.1.1.4 of the Intel Optimization Manual (September 2019) mentions that clwb is new on Ice Lake Client. Perhaps this means that the performance advantage of clwb is new on Ice Lake. Although the cpuid leaf 0x7 information from InstLatx64 says that ICL doesn't support clwb. I'm not sure who's wrong here. Someone should check whether _mm_clwb(void const *p) works on ICL. Anyway, it will most probably be supported on ICX.

clwb is also supported on Zen 2, but I don't know how it works on this microarchitecture.



来源:https://stackoverflow.com/questions/60266778/intels-clwb-instruction-invalidating-cache-lines

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!