Is global memory write considered atomic or not in CUDA?
Considering the following CUDA kernel code:
int idx = blockIdx.x*blockDim.x+threadIdx.x;
int gid
Memory acesses in CUDA are not implicitly atomic. However, the code you originally showed isn't intrinsically a memory race as long as idx
has a unique value for each thread in the running kernel.
So your original code:
int idx = blockIdx.x*blockDim.x+threadIdx.x;
globalStorage[idx] = somefunction(idx);
would be safe if the kernel launch uses a 1D grid and globalStorage
is suitably sized, whereas your second version:
int idx = blockIdx.x*blockDim.x+threadIdx.x;
int gidx = idx%1000;
globalStorage[gidx] = somefunction(idx);
would not be because multiple thread could potentially write to the same entry in globalStorage
. There is no atomic protections or serialisation mechanisms which would produce predictable results in such as case.