Just to get a better understanding of how does bert work.... I wanted to know:
In the Next Sequence Prediction Training, does the model get only 2 sentences