I\'m not an expert of CUDA but I would like to execute some code on GPU to speed-up my program. I\'ve already used avx2 intrinsics but is not enough for this critical part.