Making some, but not all, (CUDA) memory accesses uncached

空扰寡人 提交于 2020-03-22 08:20:29

问题


I just noticed it's at all possible to have (CUDA kernel) memory accesses uncached (see e.g. this answer here on SO).

Can this be done...

  • For a single kernel individually?
  • At run time rather than at compile time?
  • For writes only rather than for reads and writes?

回答1:


  1. Only if you compile that kernel individually, because this is an instruction level feature which is enabled by code generation. You could also use inline PTX assembler to issue ld.global.cg instructions for a particular load operation within a kernel [see here for details].
  2. No, it is an instruction level feature of PTX. You can JIT a version of code containing non-caching memory loads at runtime, but that is still technically compilation. You could probably use some template tricks and separate compilation to get the runtime to hold two versions of the same code built with or without caching and choose between those versions at runtime. You could also use the same tricks to get two versions of a given kernel without or without inline PTX for uncached loads [see here for one possibility of achieving this]
  3. These non-caching instructions bypass the L1 cache with byte level granularity to L2 cache. So they are load only (all writes invalidate L1 cache and store to L2).



回答2:


I don't know if it was possible before, but CUDA 8.0 gives you a possibility to fine-tune caching for specific reads/writes. See PTX manual for details.

For example, to make this code always go to the main memory on read:

const float4 val = input[i];

you could write the following:

float4 val;
const float4* myinput = input+i;
asm("ld.global.cv.v4.f32 {%0, %1, %2, %3}, [%4];" : "=f"(val.x), "=f"(val.y), "=f"(val.z), "=f"(val.w) : "l"(myinput));

I managed to speed up one of my cache-intensive kernels by about 20% using non-cached reads and writes for data that was accessed only once by design



来源:https://stackoverflow.com/questions/30420774/making-some-but-not-all-cuda-memory-accesses-uncached

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!