I would like to optimize ML code (SSD in PyTorch) on NVIDIA Jetson Xavier NX (development kit). One of the bottlenecks seems to be list slicing on PyTorch (1.6.0) tensors on GPU