I need to find the index of the maximum element in an array of floats. I am using the function \"cublasIsamax\", but this returns the index to the CPU, and this is slowing down
If you want to use CUBLAS and you have a GPU with compute capability 3.5 (K20, Titan) than you can use CUBLAS with dynamic parallelism. Than you can call CUBLAS from within a kernel on the GPU and no data will be returned to the CPU. If you have no device with cc 3.5 you will probably have to implement a find max function by yourself or look for an aditional library.