I have a main global memory (gpu_mem
), along with a variable (gpu_mem_offset
) to track the current offset of this global memory where a thread will upd
The only way to ensure a consistent update of both counter and array in that example is like this:
__global__ void kernel(int *gpu_mem, int *gpu_mem_offset)
{
int offset = atomicAdd(gpu_mem_offset, 1);
gpu_mem[offset] = some_value;
}
i.e. if you need atomic updates, then use an atomic intrinsic. That is what they are for. Here the atomic access to gpu_mem_offset
ensures every thread gets a unique value of the offset. Then the write is guaranteed to be safe, because each thread accesses a unique index.