question about modifing flag array in cuda
问题 i am doing a research about GPU programming and have a question about modifying global array in thread. __device__ float data[10] = {0,0,0,0,0,0,0,0,0,1}; __global__ void gradually_set_global_data() { while (1) { if (data[threadIdx.x + 1]) { atomicAdd(&data[threadIdx.x], data[threadIdx.x + 1]); break; } } } int main() { gradually_set_global_data<<<1, 9>>>(); cudaDeviceReset(); return 0; } The kernel should complete execution with data expected to hold [1,1,1,1,1,1,1,1,1,1], but it gets stuck