My dataset depends on a 3GB tensor. This tensor could either be on the CPU or the GPU. The bottleneck of my code is the data loading preprocessing. But I can\'t add more than a