Almost anywhere I read about programming with CUDA there is a mention of the importance that all of the threads in a warp do the same thing.
In my code I have a situation wh
The answer to your question is no. You don't need to do anything special. Anyway, you can fix this, instead of your code you can do something like this:
buffer[x1] += (d1 < 0.5);
buffer[x2] += (d2 < 0.5);
You should check if you can use shared memory and access global memory in a coalesced pattern. Also be sure that you DON'T want to write to the same index in more than 1 thread.