How to observe CUDA events and metrics for a subsection of an executable (e.g. only during a kernel execution time)?

社会主义新天地 提交于 2019-12-01 07:39:53

You may want to read the profiler documentation.

You can turn profiling on and off within an executable. The cuda runtime API for this is:


So, if you wanted to collect profile information only for a specific kernel, you could do:

#include <cuda_profiler_api.h>


and excerpting from the documentation:

When using the start and stop functions, you also need to instruct the profiling tool to disable profiling at the start of the application. For nvprof you do this with the --profile-from-start off flag. For the Visual Profiler you use the Start execution with profiling enabled checkbox in the Settings View.

Also from the documentation for nvprof specifically, you can limit event/metric tabulation to a single kernel with a command line switch:

 --kernels <kernel name>

The documentation gives additional usage possibilities.

After looking into this a bit more, it turns out that kernel level information is also given for all kernels (w/o using --kernels and specifying them specifically) by using

nvprof --events <event names> --metrics <metric names> ./<cuda benchmark>   

In fact, it gives output of the form

"Device","Kernel","Invocations","Event Name","Min","Max","Avg"

If a kernel is called multiple times in the benchmark, this allows you to see the Min, Max, Avg of the desired events for those kerne runs. Evidently the --kernels option on Cuda 7.5 Profiler allows each run of each kernel to be specified.
