Why CUDA memory copy speed behaves like this, some constant driver overhead?
问题 I always have a strange 0.04 ms overhead when working with memory in CUDA on my old GeForce 8800GT. I need to transfer ~1-2K to constant memory of my device, work with that data on it and get only one float value from the device. I have a typical code using GPU calculation: //allocate all the needed memory: pinned, device global for(int i = 0; i < 1000; i++) { //Do some heavy cpu logic (~0.005 ms long) cudaMemcpyToSymbolAsync(const_dev_mem, pinned_host_mem, mem_size, 0, cudaMemcpyHostToDevice