I am solving a machine learning problem to trace student knowledge. I have two data frames train_df.csv and test_df.csv. I am doing feature engineering on the given data set