I just started in CUDA. Now I have a question. I have N*N matrix, and a window scale is 8x8. I want subdivided this matrix into multiple sub-matrix and find max value of thi
In case you're willing to use a library, few pointers:
use NPP, set of primitives (from nvidia) https://docs.nvidia.com/cuda/npp/group__image__filter__max.html
a lower level library, for other reduce operations and more granularity in the way you use the hardware (from nvidia / nvlabs) http://nvlabs.github.io/cub/