can't enter into __global__ function using cuda
问题 I have written a code on Nsight that compiles and can be executed but the first launch can't be completed. The strange thing is that when I run it in debug mode, it works perfectly but it is too slow. Here is the part of the code before entering the function that access the GPU (where i think there is an error I can't find) : void parallelAction (int * dataReturned, char * data, unsigned char * descBase, int range, int cardBase, int streamIdx) { size_t inputBytes = range*128*sizeof(unsigned