问题
I know of the existence of nvvp
and nvprof
, of course, but for various reasons nvprof
does not want to work with my app that involves lots of shared libraries. nvidia-smi
can hook into the driver to find out what's running, but I cannot find a nice way to get nvprof
to attach to a running process.
There is a flag --profile-all-processes
which does actually give me a message "NVPROF is profiling process 12345", but nothing further prints out. I am using CUDA 8.
How can I get a detailed performance breakdown of my CUDA kernels in this situation?
来源:https://stackoverflow.com/questions/50403436/profiling-arbitrary-cuda-applications