问题
I have 2 dataframes with the data as below:
df1:
====
id name age likes
--- ----- ---- -----
0 A 21 rose
1 B 22 apple
2 C 30 grapes
4 D 21 lily
df2:
====
category Fruit Flower
--------- ------- -------
orange 1 0
apple 1 0
rose 0 1
lily 0 1
grapes 1 0
What I am trying to do is add another column to df1 which would contain the word 'Fruit' or 'Flower' depending on the one-hot encoding in df2 for that entry. I am looking for a purely pandas/numpy implementation.
Any help would be appreciated.
Thanks!
回答1:
IIUC, you can use .apply and set the axis=1 or axis="columns", which means apply function to each row.
df3 = df1.merge(df2, left_on='likes', right_on='category')
# you can add your one hot columns in here.
categories_col = ['Fruit','Flower']
def get_category(x):
for category in categories_col:
if x[category] == 1:
return category
df1["new"] = df3.apply(get_category, axis=1)
print(df1)
id name age likes new
0 0 A 21 rose Flower
1 1 B 22 apple Fruit
2 2 C 30 grapes Fruit
3 4 D 21 lily Flower
But make sure your dataframe of categories_col must be one hot encode.
回答2:
You can use apply() for that:
df1['type_string'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)
Here is a running example:
import pandas as pd
from io import StringIO
df1 = pd.read_csv(StringIO(
"""
0 A 21 rose
1 B 22 apple
2 C 30 grapes
4 D 21 lily
"""), sep='\s+', header=None)
df2 = pd.read_csv(StringIO(
"""
orange 1 0
apple 1 0
rose 0 1
lily 0 1
grapes 1 0
"""), sep='\s+', header=None)
df1.columns = ['id', 'name', 'age', 'likes']
df2.columns = ['category', 'Fruit', 'Flower']
df1['category'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)
Input
id name age likes
0 0 A 21 rose
1 1 B 22 apple
2 2 C 30 grapes
3 4 D 21 lily
Output
id name age likes category
0 0 A 21 rose Fruit
1 1 B 22 apple Fruit
2 2 C 30 grapes Flower
3 4 D 21 lily Flower
回答3:
the trick lies in the fact that the two tables have different number of rows, also the examples above might not work if df2 has more categories than what is in df1.
here's a working example:
df1 = pd.DataFrame([['orange',12],['rose',3],['apple',44],['grapes',1]], columns = ['name', 'age'])
df1
name age
0 orange 12
1 rose 3
2 apple 44
3 grapes 1
df2 = pd.DataFrame([['orange',1],['rose',0],['apple',1],['grapes',1],['daffodils',0],['berries',1]], columns = ['cat', 'Fruit'])
df2
cat Fruit
0 orange 1
1 rose 0
2 apple 1
3 grapes 1
4 daffodils 0
5 berries 1
one single line, run a listcomp with a conditional statement and do the merged df1 and df2 on the fly where the key df1.name = df2.cat:
df1['flag'] = ['Fruit' if i == 1 else 'Flower' for i in df1.merge(df2,how='left',left_on='name', right_on='cat').Fruit]
df1
output
name age flag
0 orange 12 Fruit
1 rose 3 Flower
2 apple 44 Fruit
3 grapes 1 Fruit
来源:https://stackoverflow.com/questions/53078951/decode-one-hot-dataframe-in-pandas