I am trying to pretrain BERT on dataset (wiki103) which contains 150k sentences. After 12 epochs nsm task gives accuracy around 0.76 (overfits if I continue with more epochs