Problems with a binary one-hot (one-of-K) coding in python

后端 未结 2 739
我寻月下人不归
我寻月下人不归 2021-02-02 03:57

Binary one-hot (also known as one-of-K) coding lies in making one binary column for each distinct value for a categorical variable. For example, if one has a color column (categ

相关标签:
2条回答
  • 2021-02-02 04:03

    If your columns are in the same order, you can concatenate the dfs, use get_dummies, and then split them back again, e.g.,

    encoded = pd.get_dummies(pd.concat([train,test], axis=0))
    train_rows = train.shape[0]
    train_encoded = encoded.iloc[:train_rows, :]
    test_encoded = encoded.iloc[train_rows:, :] 
    

    If your columns are not in the same order, then you'll have challenges regardless of what method you try.

    0 讨论(0)
  • 2021-02-02 04:10

    You can set your data type to categorical:

    In [5]: df_train = pd.DataFrame({"car":Series(["seat","bmw"]).astype('category',categories=['seat','bmw','mercedes']),"color":["red","green"]})
    
    In [6]: df_train
    Out[6]: 
        car  color
    0  seat    red
    1   bmw  green
    
    In [7]: pd.get_dummies(df_train )
    Out[7]: 
       car_seat  car_bmw  car_mercedes  color_green  color_red
    0         1        0             0            0          1
    1         0        1             0            1          0
    

    See this issue of Pandas.

    0 讨论(0)
提交回复
热议问题