I am trying to measure time inside the CUDA kernel. I am following https://stackoverflow.com/a/43010589/8044236. To check the correctness of measurements I implemented the f