CUDA disable L1 cache only for one variable

允我心安 提交于 2019-11-28 05:06:49
Reguj

As mentioned above you can use inline PTX, here is an example:

__device__ __inline__ double ld_gbl_cg(const double *addr) {
  double return_value;
  asm("ld.global.cg.f64 %0, [%1];" : "=d"(return_value) : "l"(addr));
  return return_value;
}

You can easily vary this by swapping .f64 for .f32 (float) or .s32 (int) etc., the constraint of return_value "=d" for "=f" (float) or "=r" (int) etc. Note that the last constraint before (addr) - "l" - denotes 64 bit addressing, if you are using 32 bit addressing, it should be "r".

Greg Smith

Inline PTX can be used to load and store the variable. ld.cg and st.cg instructions only cache data in L2. The cache operators are described in section 8.7.8.1 Cache Operators of the PTX ISA 2.3 document. The instructions or interest are ld and st. Inline PTX is described in Using Inline PTX Assembly in CUDA.

If you declare the variable to be volatile, then it will only be cached in the L2 cache on Fermi GPUs. Note that some compiler optimizations, such as removing repeated loads, are not performed on volatile variables because the compiler assumes they may be written by another thread.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!