Pandas - Convert a categorical column to binary encoded form

前端 未结 2 1848
面向向阳花
面向向阳花 2021-02-09 22:10

I have a dataset that looks like so -

     yyyy      month        tmax         tmin
0    1908    January         5.0         -1.4
1    1908   February         7         


        
相关标签:
2条回答
  • 2021-02-09 22:46

    I think you need get_dummies:

    df = pd.get_dummies(df['month'])
    

    And if need add new columns to original and remove month use join with pop:

    df2 = df.join(pd.get_dummies(df.pop('month')))
    print (df2.head())
       yyyy  tmax  tmin  April  August  December  February  January  July  June  \
    0  1908   5.0  -1.4      0       0         0         0        1     0     0   
    1  1908   7.3   1.9      0       0         0         1        0     0     0   
    2  1908   6.2   0.3      0       0         0         0        0     0     0   
    3  1908   7.4   2.1      1       0         0         0        0     0     0   
    4  1908  16.5   7.7      0       0         0         0        0     0     0   
    
       March  May  November  October  September  
    0      0    0         0        0          0  
    1      0    0         0        0          0  
    2      1    0         0        0          0  
    3      0    0         0        0          0  
    4      0    1         0        0          0  
    

    If NOT need remove column month:

    df2 = df.join(pd.get_dummies(df['month']))
    print (df2.head())
       yyyy     month  tmax  tmin  April  August  December  February  January  \
    0  1908   January   5.0  -1.4      0       0         0         0        1   
    1  1908  February   7.3   1.9      0       0         0         1        0   
    2  1908     March   6.2   0.3      0       0         0         0        0   
    3  1908     April   7.4   2.1      1       0         0         0        0   
    4  1908       May  16.5   7.7      0       0         0         0        0   
    
       July  June  March  May  November  October  September  
    0     0     0      0    0         0        0          0  
    1     0     0      0    0         0        0          0  
    2     0     0      1    0         0        0          0  
    3     0     0      0    0         0        0          0  
    4     0     0      0    1         0        0          0  
    

    If need sort columns there is more possible solutions - use reindex or reindex_axis:

    months = ['January', 'February', 'March','April' ,'May',  'June', 'July', 'August', 'September','October', 'November','December']
    
    df1 = pd.get_dummies(df['month']).reindex_axis(months, 1)
    print (df1.head())
       January  February  March  April  May  June  July  August  September  \
    0        1         0      0      0    0     0     0       0          0   
    1        0         1      0      0    0     0     0       0          0   
    2        0         0      1      0    0     0     0       0          0   
    3        0         0      0      1    0     0     0       0          0   
    4        0         0      0      0    1     0     0       0          0   
    
       October  November  December  
    0        0         0         0  
    1        0         0         0  
    2        0         0         0  
    3        0         0         0  
    4        0         0         0  
    
    df1 = pd.get_dummies(df['month']).reindex(columns=months)
    print (df1.head())
       January  February  March  April  May  June  July  August  September  \
    0        1         0      0      0    0     0     0       0          0   
    1        0         1      0      0    0     0     0       0          0   
    2        0         0      1      0    0     0     0       0          0   
    3        0         0      0      1    0     0     0       0          0   
    4        0         0      0      0    1     0     0       0          0   
    
       October  November  December  
    0        0         0         0  
    1        0         0         0  
    2        0         0         0  
    3        0         0         0  
    4        0         0         0  
    

    Or convert column month to ordered categorical:

    df1 = pd.get_dummies(df['month'].astype('category', categories=months, ordered=True))
    print (df1.head())
       January  February  March  April  May  June  July  August  September  \
    0        1         0      0      0    0     0     0       0          0   
    1        0         1      0      0    0     0     0       0          0   
    2        0         0      1      0    0     0     0       0          0   
    3        0         0      0      1    0     0     0       0          0   
    4        0         0      0      0    1     0     0       0          0   
    
       October  November  December  
    0        0         0         0  
    1        0         0         0  
    2        0         0         0  
    3        0         0         0  
    4        0         0         0  
    
    0 讨论(0)
  • 2021-02-09 22:49

    IIUC,

    You can use assign, ** unpacking operator, and pd.get_dummies:

    df.assign(**pd.get_dummies(df['month']))
    

    Output:

        yyyy      month  tmax  tmin  April  August  December  February  January  \
    0   1908    January   5.0  -1.4      0       0         0         0        1   
    1   1908   February   7.3   1.9      0       0         0         1        0   
    2   1908      March   6.2   0.3      0       0         0         0        0   
    3   1908      April   7.4   2.1      1       0         0         0        0   
    4   1908        May  16.5   7.7      0       0         0         0        0   
    5   1908       June  17.7   8.7      0       0         0         0        0   
    6   1908       July  20.1  11.0      0       0         0         0        0   
    7   1908     August  17.5   9.7      0       1         0         0        0   
    8   1908  September  16.3   8.4      0       0         0         0        0   
    9   1908    October  14.6   8.0      0       0         0         0        0   
    10  1908   November   9.6   3.4      0       0         0         0        0   
    11  1908   December   5.8  -0.3      0       0         1         0        0   
    12  1909    January   5.0   0.1      0       0         0         0        1   
    13  1909   February   5.5  -0.3      0       0         0         1        0   
    14  1909      March   5.6  -0.3      0       0         0         0        0   
    15  1909      April  12.2   3.3      1       0         0         0        0   
    16  1909        May  14.7   4.8      0       0         0         0        0   
    17  1909       June  15.0   7.5      0       0         0         0        0   
    18  1909       July  17.3  10.8      0       0         0         0        0   
    19  1909     August  18.8  10.7      0       1         0         0        0   
    20  1909  September  14.5   8.1      0       0         0         0        0   
    21  1909    October  12.9   6.9      0       0         0         0        0   
    22  1909   November   7.5   1.7      0       0         0         0        0   
    23  1909   December   5.3   0.4      0       0         1         0        0   
    24  1910    January   5.2  -0.5      0       0         0         0        1   
    
        July  June  March  May  November  October  September  
    0      0     0      0    0         0        0          0  
    1      0     0      0    0         0        0          0  
    2      0     0      1    0         0        0          0  
    3      0     0      0    0         0        0          0  
    4      0     0      0    1         0        0          0  
    5      0     1      0    0         0        0          0  
    6      1     0      0    0         0        0          0  
    7      0     0      0    0         0        0          0  
    8      0     0      0    0         0        0          1  
    9      0     0      0    0         0        1          0  
    10     0     0      0    0         1        0          0  
    11     0     0      0    0         0        0          0  
    12     0     0      0    0         0        0          0  
    13     0     0      0    0         0        0          0  
    14     0     0      1    0         0        0          0  
    15     0     0      0    0         0        0          0  
    16     0     0      0    1         0        0          0  
    17     0     1      0    0         0        0          0  
    18     1     0      0    0         0        0          0  
    19     0     0      0    0         0        0          0  
    20     0     0      0    0         0        0          1  
    21     0     0      0    0         0        1          0  
    22     0     0      0    0         1        0          0  
    23     0     0      0    0         0        0          0  
    24     0     0      0    0         0        0          0 
    
    0 讨论(0)
提交回复
热议问题