I want to use shared memory between kernel call of one kernel. Can I use shared memory between kernel call?
Take a try of page-locked memory, but the speed should be much slower than graphic memory. cudaHostAlloc (void **ptr, size_t size, cudaHostAllocMapped); then send the ptr to the kernel code.