Specify list of possible values for Pandas get_dummies

后端 未结 4 1297
半阙折子戏
半阙折子戏 2021-02-14 17:36

Suppose I have a Pandas DataFrame like the below and I\'m encoding categorical_1 for training in scikit-learn:

data = {\'numeric_1\':[12.1, 3.2, 5.5, 6.8, 9.9],          


        
4条回答
  •  一整个雨季
    2021-02-14 18:06

    To handle the mismatch between the set of categorical values in train and test sets I used;

        length = train_categorical_data.shape[0]
        empty_col = np.zeros((length,1))
        test_categorical_data_processed = pd.DataFrame()
        for col in train_categorical_data.columns:
            test_categorical_data_processed[col] = test_categorical_data.get(col, empty_col)
    

提交回复
热议问题