I\'m trying to learn Cuda and one of the things I want to make is a dot product kernel. The kernel works ok for dot products that result in low coefficient values in the result