cudaMemcpyToSymbol performance
I have some functions that load a variable in constant device memory and launch a kernel function. I noticed that the first time that one function load a variable in constant memory takes 0.6 seconds but the next loads on constant memory are very fast(0.0008 seconds). This behaviour occours regardless of which function is the first in the main. Below an example code: __constant__ double res1; __global__kernel1(...) {...} void function1() { double resHost = 255 / ((double) size); CUDA_CHECK_RETURN(cudaMemcpyToSymbol(res1, &resHost, sizeof(double))); //prepare and launch kernel } __constant__