I tried writing a CUDA kernel to move data from one memory in GPU to another memory in GPU. Basically I need to do windowing on that data before performing FFT. I was just c