I\'m trying to sum an array with this code and I am stuck. I probably need some \"CUDA for dummies tutorial\", because I spent so much time with such basic operation and I c
Okay, I think you need to start fresh. Take a look into this step-by-step process guide from NVIDiA on reduction
Calling the kernel like this fixes the problem.
dim3 dimBlock(128);
dim3 dimGrid(N/dimBlock.x);
int smemSize = dimBlock.x * sizeof(int);
sum_reduction<<<dimGrid, dimBlock, smemSize>>>(in, out, N);