Timing a CUDA application using events

后端 未结 1 931
耶瑟儿~
耶瑟儿~ 2021-02-10 14:27

I am using the following two functions to time different parts (cudaMemcpyHtoD, kernel execution, cudaMemcpyDtoH) of my code (which includes multi-gpus, concurrent kernels on sa

相关标签:
1条回答
  • 2021-02-10 15:08

    First, if this is for production code, you may want to be able to do something between the second cudaEventRecord and cudaEventSynchronize(). Otherwise, this could reduce the ability of your app to overlap GPU and CPU work.

    Next, I would separate event creation and destruction from event recording. I'm not sure of the cost, but in general you might not want to call cudaEventCreate and cudaEventDestroy often.

    What I would do is create a class like this

    class EventTimer {
    public:
      EventTimer() : mStarted(false), mStopped(false) {
        cudaEventCreate(&mStart);
        cudaEventCreate(&mStop);
      }
      ~EventTimer() {
        cudaEventDestroy(mStart);
        cudaEventDestroy(mStop);
      }
      void start(cudaStream_t s = 0) { cudaEventRecord(mStart, s); 
                                       mStarted = true; mStopped = false; }
      void stop(cudaStream_t s = 0)  { assert(mStarted);
                                       cudaEventRecord(mStop, s); 
                                       mStarted = false; mStopped = true; }
      float elapsed() {
        assert(mStopped);
        if (!mStopped) return 0; 
        cudaEventSynchronize(mStop);
        float elapsed = 0;
        cudaEventElapsedTime(&elapsed, mStart, mStop);
        return elapsed;
      }
    
    private:
      bool mStarted, mStopped;
      cudaEvent_t mStart, mStop;
    };
    

    Note I didn't include cudaSetDevice() -- seems to me that should be left to the code that uses this class, to make it more flexible. The user would have to ensure the same device is active when start and stop are called.

    PS: It is not NVIDIA's intent for CUTIL to be relied upon for production code -- it is used simply for convenience in our examples and is not as rigorously tested or optimized as the CUDA libraries and compilers themselves. I recommend you extract things like cutilSafeCall() into your own libraries and headers.

    0 讨论(0)
提交回复
热议问题