Hey there, I have the following piece of code:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double *PNT;
#endif
Although this is an old question I add this for future googlers:
The problem is here:
cudaMalloc((void **)&PNT, sizeof(double)*SIZE);
cudaMemcpy(PNT, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
The cudaMalloc
writes to the host version of PNT
which is actually a device variable that must not be accessed from host. So correct would be to allocate memory, copy the address to the device symbol and copy the memory to the the memory pointed to by that symbol:
void* memPtr;
cudaMalloc(&memPtr, sizeof(double)*SIZE);
cudaMemcpyToSymbol(PNT, &memPtr, sizeof(memPtr));
// In other places you'll need an additional:
// cudaMemcpyFromSymbol(&memPtr, PNT, sizeof(memPtr));
cudaMemcpy(memPtr, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
Easier would be:
#if USE_CONST == 1
__constant__ double PNT[ SIZE ];
#else
__device__ double PNT[ SIZE ];
#endif
// No #if required anymore:
cudaMemcpyToSymbol(PNT, point, sizeof(double)*SIZE);
The correct usage of cudaMemcpyToSymbol prior to CUDA 4.0 is:
cudaMemcpyToSymbol("PNT", point, sizeof(double)*SIZE)
or alternatively:
double *cpnt;
cudaGetSymbolAddress((void **)&cpnt, "PNT");
cudaMemcpy(cpnt, point, sizeof(double)*SIZE, cudaMemcpyHostToDevice);
which might be a bit faster if you are planning to access the symbol from the host API more than once.
EDIT: misunderstood the question. For the global memory version, do something similar to the second version for constant memory
double *gpnt;
cudaGetSymbolAddress((void **)&gpnt, "PNT");
cudaMemcpy(gpnt, point, sizeof(double)*SIZE. cudaMemcpyHostToDevice););