I\'m writing a CUDA kernel which involves calculating the maximum value on a given matrix and I\'m evaluating possibilities. The best way I could find is:
Forcing ev
NVIDIA has a CUDA demo that does reduction: here. There's a whitepaper that goes along with it that explains some motivations behind the design.