问题
Instead of passing lots of arguments to a kernel, I use a __constant__
variable. This variable is an array of structures which contains many pointers to data in global (these pointer would be a list of arguments); an array for the multiple different datasets to call a kernel on. Then the kernel accesses this array and dereferences to global the appropriate data. My question is, does this data get cached through L2 or the constant cache? Moreover, if the latter and, if loaded via __ldg()
, does it go through L1 or still the constant cache?
To be more specific the data itself sits in global, however the kernel dereferences a __constant__
variable to get to it. Does this adversely affect caching?
回答1:
Constant variables accessed by immediate constants (constants in the opcode) or indexed constants (accessed via ldc
instruction) are accessed by (bank, offset) pair, not by address. These reads go through the immediate constant and index constant caches. On some chips these are the same cache. Examples of constant accesses are:
// immediate constant
ADD r0, r1, c[bank][offset]
// r1 has packed version of bank, offset
LDC r0, r1
Arguments for cc2.0 and above are passed such that you will see immediate constant accesses.
Constant accesses go through the constant memory hierarchy which in end results in a global address which can be in system memory or device memory.
If you set a constant variable to a pointer to global then the data will be read through the data hierarchy.
If you define a const variable the compiler can choose to put the read only data in either a bank/offset or an address.
If you review the SASS (nvdisasm or tools) you will see LD
instructions. Depending on the chip this data may be cached in the L1/Tex cache then L2 cache.
SHARED
LDS/STS/ATOMS -> shared memory
GENERIC
LD/ST (generic to shared) -> shared memory
LD/ST (generic to global) -> L1/TEX -> L2
LD/ST (generic to local) -> L1/TEX -> L2
LOCAL
LDL/STL (local) -> L1/TEX -> L2
GLOBAL
LDG/STG (global) -> TEX -> L2
INDEXED CONSTANT
LDC -> indexed constant cache -> ...-> L2
L2 misses can go to device memory or pinned system memory.
In the case you mention the constant variable will very likely be accessed via an immediate constant (best performance assuming reasonable size of constants) and the de-referenced pointer will result in a global memory access.
On GK110 LDG
instructions are cached in the texture cache.
On Maxwell LDG.CI
instructions are cached in the texture cache. LDG.CA
operations are cached in the texture cache (GM20x). All other LDG
accesses go through the texture cache but are not cached beyond the lifetime of the warp instruction.
来源:https://stackoverflow.com/questions/34170310/cuda-constant-deference-to-global-memory-which-cache