I\'m currently writing a CUDA kernel for a custom operation (an activation) for PyTorch, but I\'m quite unfamiliar with any form of GPU programming. For reference, I was followi