I\'m writing a CUDA kernel which involves calculating the maximum value on a given matrix and I\'m evaluating possibilities. The best way I could find is:
The atomicAdd function could also be used, but it is much less efficient than the approaches mentioned above. http://supercomputingblog.com/cuda/cuda-tutorial-4-atomic-operations/