问题
I have a simple dataset.
id,question,category,tags,day,quarter,group_id
1,What is your name,Introduction,Introduction,1,3,0
2,What is your name,Introduction,"Introduction, work",1,3,1
Now if you see, in the tags
column there are multiple inputs seperated by commas. If I try to one-hot-encode using pandas get_dummies
function I will get that as a single column. But I wanted to create columns for each tags. How can I do that possibly?
回答1:
I believe need str.get_dummies:
df1 = df['tags'].str.get_dummies(', ')
print (df1)
Introduction work
0 1 0
1 1 1
回答2:
you should use pivottable
of panda's dataframe method.
the following code might be useful
pivot_table(df, values='D', index=['id','question','category','day','quarter','group_id'],columns=['tags'])
来源:https://stackoverflow.com/questions/50523537/one-hot-encoding-with-multiple-tags-in-the-column