I am trying to use ILGPU to make my code execution faster. The way I did it, it is around 60% faster when I run it with a cuda accelerator, but when I run it with a CPU acce