Thoughts about train_test_split for machine learning
问题 I just noticed that many people tend to use train_test_split even before handling the missing data, and seem like they split the data at the very beginning and there are also a bunch of people, they tend to slipt the data right before model building step after they do all the data cleaning and feature engineering, feature selection stuff. The people tend to split the data at the very first saying that it is to prevent the data leakage. I am right now just so confused about the pipeline of