I have a dataset with 100 000 rows. I load it to a dataframe, shuffle and split to train and test set:
# Read tsv content to dataframe. df = pd.read_csv(data_loca