There is a dataset of about 6,000,000 rows. I use the below codes to split the dataset into train set and test set:
from sklearn.model_selection import train_test