I have this problem with how to code this. I need to transpose a matrix using a kernel with CUDA. With that, I have to do the same thing but with an image as an input (trans