One Hot Encoding with multiple tags in the column

大城市里の小女人 提交于 2020-12-21 04:01:42

问题


I have a simple dataset.

id,question,category,tags,day,quarter,group_id

1,What is your name,Introduction,Introduction,1,3,0

2,What is your name,Introduction,"Introduction, work",1,3,1

Now if you see, in the tags column there are multiple inputs seperated by commas. If I try to one-hot-encode using pandas get_dummies function I will get that as a single column. But I wanted to create columns for each tags. How can I do that possibly?


回答1:


I believe need str.get_dummies:

df1 = df['tags'].str.get_dummies(', ')
print (df1)

   Introduction  work
0             1     0
1             1     1



回答2:


you should use pivottable of panda's dataframe method. the following code might be useful

pivot_table(df, values='D', index=['id','question','category','day','quarter','group_id'],columns=['tags'])


来源:https://stackoverflow.com/questions/50523537/one-hot-encoding-with-multiple-tags-in-the-column

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!