Converting pandas column of comma-separated strings into dummy variables

后端 未结 2 1474
别那么骄傲
别那么骄傲 2020-12-03 10:44

In my dataframe, I have a categorical variable that I\'d like to convert into dummy variables. This column however has multiple values separated by commas:

0         


        
相关标签:
2条回答
  • 2020-12-03 11:23

    Use str.get_dummies

    df['col'].str.get_dummies(sep=',')
    
        a   b   c   d
    0   1   0   0   0
    1   1   1   1   0
    2   1   1   0   1
    3   0   0   0   1
    4   0   0   1   1
    

    Edit: Updating the answer to address some questions.

    Qn 1: Why is it that the series method get_dummies does not accept the argument prefix=... while pandas.get_dummies() does accept it

    Series.str.get_dummies is a series level method (as the name suggests!). We are one hot encoding values in one Series (or a DataFrame column) and hence there is no need to use prefix. Pandas.get_dummies on the other hand can one hot encode multiple columns. In which case, the prefix parameter works as an identifier of the original column.

    If you want to apply prefix to str.get_dummies, you can always use DataFrame.add_prefix

    df['col'].str.get_dummies(sep=',').add_prefix('col_')
    

    Qn 2: If you have more than one column to begin with, how do you merge the dummies back into the original frame? You can use DataFrame.concat to merge one hot encoded columns with the rest of the columns in dataframe.

    df = pd.DataFrame({'other':['x','y','x','x','q'],'col':['a','a,b,c','a,b,d','d','c,d']})
    df = pd.concat([df, df['col'].str.get_dummies(sep=',')], axis = 1).drop('col', 1)
    
      other a   b   c   d
    0   x   1   0   0   0
    1   y   1   1   1   0
    2   x   1   1   0   1
    3   x   0   0   0   1
    4   q   0   0   1   1
    
    0 讨论(0)
  • 2020-12-03 11:45

    The str.get_dummies function does not accept prefix parameter, but you can rename the column names of the returned dummy DataFrame:

    data['col'].str.get_dummies(sep=',').rename(lambda x: 'col_' + x, axis='columns')
    
    0 讨论(0)
提交回复
热议问题