Specify list of possible values for Pandas get_dummies

后端 未结 4 1308
半阙折子戏
半阙折子戏 2021-02-14 17:36

Suppose I have a Pandas DataFrame like the below and I\'m encoding categorical_1 for training in scikit-learn:

data = {\'numeric_1\':[12.1, 3.2, 5.5, 6.8, 9.9],          


        
4条回答
  •  猫巷女王i
    2021-02-14 18:06

    I encountered the same problem as yours, that is how to unify the dummy categories between training data and testing data when using get_dummies() in Pandas. Then I found a solution when exploring the House Price competition in Kaggle, that is to process training data and testing data at the same time. Suppose you have two dataframes df_train and df_test (not containing target data in them).

    all_data = pd.concat([df_train,df_test], axis=0)
    all_data = pd.get_dummies(all_data) 
    X_train  = all_data[:df_train.shape[0]]  # select the processed training data  
    X_test   = all_data[-df_test.shape[0]:]  # select the processed testing data
    

    Hope it helps.

提交回复
热议问题