one-hot-encoding

One Hot Encoding with multiple tags in the column

。_饼干妹妹 提交于 2020-12-21 04:03:29
问题 I have a simple dataset. id,question,category,tags,day,quarter,group_id 1,What is your name,Introduction,Introduction,1,3,0 2,What is your name,Introduction,"Introduction, work",1,3,1 Now if you see, in the tags column there are multiple inputs seperated by commas. If I try to one-hot-encode using pandas get_dummies function I will get that as a single column. But I wanted to create columns for each tags. How can I do that possibly? 回答1: I believe need str.get_dummies: df1 = df['tags'].str

One Hot Encoding with multiple tags in the column

大城市里の小女人 提交于 2020-12-21 04:01:42
问题 I have a simple dataset. id,question,category,tags,day,quarter,group_id 1,What is your name,Introduction,Introduction,1,3,0 2,What is your name,Introduction,"Introduction, work",1,3,1 Now if you see, in the tags column there are multiple inputs seperated by commas. If I try to one-hot-encode using pandas get_dummies function I will get that as a single column. But I wanted to create columns for each tags. How can I do that possibly? 回答1: I believe need str.get_dummies: df1 = df['tags'].str

One Hot Encoding with multiple tags in the column

假如想象 提交于 2020-12-21 04:01:19
问题 I have a simple dataset. id,question,category,tags,day,quarter,group_id 1,What is your name,Introduction,Introduction,1,3,0 2,What is your name,Introduction,"Introduction, work",1,3,1 Now if you see, in the tags column there are multiple inputs seperated by commas. If I try to one-hot-encode using pandas get_dummies function I will get that as a single column. But I wanted to create columns for each tags. How can I do that possibly? 回答1: I believe need str.get_dummies: df1 = df['tags'].str

One-hot encoding for words which occur in multiple columns

感情迁移 提交于 2020-12-13 04:50:05
问题 I want to create on-hot encoded data from categorical data, which you can see here. Label1 Label2 Label3 0 Street fashion Clothing Fashion 1 Clothing Outerwear Jeans 2 Architecture Property Clothing 3 Clothing Black Footwear 4 White Photograph Beauty The problem (for me) is that one specific label (e.g. clothing) can be in label1, label2 or label 3. I tried pd.get_dummies but this created data like: Label1_Clothing Label2_Clothing Label3_Clothing 0 0 1 0 1 1 0 0 2 0 0 1 Is there a way to only