XLA allocates 4G of memory to this tensor. The size of which seems to scale with the batch size. Which doesn\'t make sense to me, it doesn\'t seem to be part of the model gr