问题
This question is a continuation of Interpreting the verbose output of ptxas, part I .
When we compile a kernel .ptx
file with ptxas -v
, or compile it from a .cu
file with -ptxas-options=-v
, we get a few lines of output such as:
ptxas info : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]
(same example as in the linked-to question; but with name demangling)
This question regards the last line. A few more examples from other kernels:
ptxas info : Used 19 registers, 336 bytes cmem[0], 4 bytes cmem[2]
...
ptxas info : Used 19 registers, 336 bytes cmem[0]
...
ptxas info : Used 6 registers, 16 bytes smem, 328 bytes cmem[0]
How do we interpret the information on this line, other than the number of registers used? Specifically:
- Is
cmem
short for constant memory? - Why are there different categories of
cmem
, i.e.cmem[0]
,cmem[2]
,cmem[14]
? smem
probably stands forshared memory
; is it only static shared memory?- Under which conditions does each kind of entry appear on this line?
回答1:
Is cmem short for constant memory?
Yes
Why are there different categories of cmem, i.e. cmem[0], cmem[2], cmem[14]?
They represent different constant memory banks. cmem[0]
is the reserved bank for kernel arguments and statically sized constant values.
smem probably stands for shared memory; is it only static shared memory?
It is, and how could it be otherwise.
Under which conditions does each kind of entry appear on this line?
Mostly answered here.
回答2:
Collected and reformatted...
Resources on the last ptxas info line:
registers
- in the register file on every SM (multiprocessor)gmem
- Global memorysmem
- Static Shared memorycmem[N]
- Constant memory bank with index N.cmem[0]
- Bank reserved for kernel argument and statically-sized constant valuescmem[2]
- ???cmem[4]
- ???cmem[14]
- ???
Each of these categories will be shown if the kernel uses any such memory (Registers - probably always shown); thus it is no surprise all the examples show some cmem[0]
usage.
You can read a bit more on the CUDA memory hierarchy in Section 2.3 of the Programming Guide and the links there. Also, there's this blog post about static vs dynamic shared memory.
来源:https://stackoverflow.com/questions/56176307/interpreting-the-verbose-output-of-ptxas-part-ii