I am training a BERT model on a relatively small dataset and cannot afford to lose any labelled sample as they must all be used for training. Due to GPU memory constraints, I am