I am compering the execution times between a serial and a CUDA code. The problem is that the CUDA version is much slower than the serial one. The serial code needs 9sec by t