问题
I want to try gst_inst_128bit instruction. In the same program, nvvp give a lot of gst_inst_128bit command executed. While in nsight's profiler, 4 times gst_inst_32bit instructions is obtained. They should be a same program. How could this situation happen?
The experiment was tried on Linux, CUDA 5.0, GTX 580. The program is only copying data from one array to another in kernel function: In main:
cudaMalloc((void**)&dev_a, NUM * sizeof(float));
cudaMalloc((void**)&dev_b, NUM * sizeof(float));
kernel<<<grid,block>>>((uint4 *)dev_a, (uint4 *)dev_b);
the kernel:
__global__ void kernel(uint4 *a, uint4 *b){
unsigned int id = blockIdx.x * THREAD_NUM + threadIdx.x;
for(unsigned int i = 0;i < LOOP/4;i++){
b[id + i * GRID_NUM * THREAD_NUM] = a[id + i * GRID_NUM * THREAD_NUM];
}
return;
回答1:
Profiler in Nsight EE and standalone Visual Profiler on Linux are based on a same codebase. Please make sure:
- You are using same executable.
- There is no difference in environment variable values (e.g. LD_LIIBRARY_PATH).
Please note that Nsight EE launch UI may be slightly confusing. When you click "Profile" after debugging the debug build, it may actually run profiling on debug executable trying to keep all the custom launch settings (e.g. command line arguments, working folder, etc.) you could have setup. From the main menu click Run->Profile Configurations... to see the settings Nsight uses when profiling your application.
来源:https://stackoverflow.com/questions/14254512/nvvp-and-nsights-profiler-give-a-different-result