Converting pandas column of comma-separated strings into dummy variables

余生长醉 提交于 2019-11-27 03:33:23

问题


In my dataframe, I have a categorical variable that I'd like to convert into dummy variables. This column however has multiple values separated by commas:

0    'a'
1    'a,b,c'
2    'a,b,d'
3    'd'
4    'c,d'

Ultimately, I'd want to have binary columns for each possible discrete value; in other words, final column count equals number of unique values in the original column. I imagine I'd have to use split() to get each separate value but not sure what to do afterwards. Any hint much appreciated!

Edit: Additional twist. Column has null values. And in response to comment, the following is the desired output. Thanks!

   a  b  c  d
0  1  0  0  0
1  1  1  1  0
2  1  1  0  1
3  0  0  0  1
4  0  0  1  1

回答1:


Use str.get_dummies

df['col'].str.get_dummies(sep=',')

    a   b   c   d
0   1   0   0   0
1   1   1   1   0
2   1   1   0   1
3   0   0   0   1
4   0   0   1   1



回答2:


The str.get_dummies function does not accept prefix parameter, but you can rename the column names of the returned dummy DataFrame:

data['col'].str.get_dummies(sep=',').rename(lambda x: 'col_' + x, axis='columns')


来源:https://stackoverflow.com/questions/46867201/converting-pandas-column-of-comma-separated-strings-into-dummy-variables

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!