Making some, but not all, (CUDA) memory accesses uncached
问题 I just noticed it's at all possible to have (CUDA kernel) memory accesses uncached (see e.g. this answer here on SO). Can this be done... For a single kernel individually? At run time rather than at compile time? For writes only rather than for reads and writes? 回答1: Only if you compile that kernel individually, because this is an instruction level feature which is enabled by code generation. You could also use inline PTX assembler to issue ld.global.cg instructions for a particular load