I tried the code in this link Is CUDA pinned memory zero-copy? The one who asked claims the program worked fine for him But does not work the same way on mine the values do
First of all, to allocate ZeroCopy memory, you have to specify cudaHostAllocMapped
flag as an argument to cudaHostAlloc
.
cudaHostAlloc((void **)&pinnedHostPtr, THREADS * sizeof(double), cudaHostAllocMapped);
Still the pinnedHostPointer
will be used to access the mapped memory from the host side only. To access the same memory from device, you have to get the device side pointer to the memory like this:
double* dPtr;
cudaHostGetDevicePointer(&dPtr, pinnedHostPtr, 0);
Pass this pointer as kernel argument.
testPinnedMemory<<< numBlocks, threadsPerBlock>>>(dPtr);
Also, you have to synchronize the kernel execution with the host to read the updated values. Just add cudaDeviceSynchronize
after the kernel call.
The code in the linked question is working, because the person who asked the question is running the code on a 64 bit OS with a GPU of Compute Capability 2.0 and TCC enabled. This configuration automatically enables the Unified Virtual Addressing feature of the GPU in which the device sees host + device memory as a single large memory instead of separate ones and host pointers allocated using cudaHostAlloc
can be passed directly to the kernel.
In your case, the final code will look like this:
#include
__global__ void testPinnedMemory(double * mem)
{
double currentValue = mem[threadIdx.x];
printf("Thread id: %d, memory content: %f\n", threadIdx.x, currentValue);
mem[threadIdx.x] = currentValue+10;
}
int main()
{
const size_t THREADS = 8;
double * pinnedHostPtr;
cudaHostAlloc((void **)&pinnedHostPtr, THREADS * sizeof(double), cudaHostAllocMapped);
//set memory values
for (size_t i = 0; i < THREADS; ++i)
pinnedHostPtr[i] = i;
double* dPtr;
cudaHostGetDevicePointer(&dPtr, pinnedHostPtr, 0);
//call kernel
dim3 threadsPerBlock(THREADS);
dim3 numBlocks(1);
testPinnedMemory<<< numBlocks, threadsPerBlock>>>(dPtr);
cudaDeviceSynchronize();
//read output
printf("Data after kernel execution: ");
for (int i = 0; i < THREADS; ++i)
printf("%f ", pinnedHostPtr[i]);
printf("\n");
return 0;
}