Decode one-hot dataframe in Pandas

家住魔仙堡 提交于 2021-02-19 03:56:46

问题


I have 2 dataframes with the data as below:

df1:
====
id   name   age   likes
---  -----  ----  -----
0     A      21    rose
1     B      22    apple
2     C      30    grapes
4     D      21    lily

df2:
====
category    Fruit   Flower 
---------  -------  -------
orange      1        0
apple       1        0       
rose        0        1
lily        0        1
grapes      1        0

What I am trying to do is add another column to df1 which would contain the word 'Fruit' or 'Flower' depending on the one-hot encoding in df2 for that entry. I am looking for a purely pandas/numpy implementation.

Any help would be appreciated.

Thanks!


回答1:


IIUC, you can use .apply and set the axis=1 or axis="columns", which means apply function to each row.

df3 = df1.merge(df2, left_on='likes', right_on='category')

# you can add your one hot columns in here.
categories_col = ['Fruit','Flower']

def get_category(x):
    for category in categories_col:
        if x[category] == 1:
            return category
df1["new"] = df3.apply(get_category, axis=1)

print(df1)
    id  name    age likes   new
0   0   A   21  rose    Flower
1   1   B   22  apple   Fruit
2   2   C   30  grapes  Fruit  
3   4   D   21  lily    Flower

But make sure your dataframe of categories_col must be one hot encode.




回答2:


You can use apply() for that:

df1['type_string'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)  

Here is a running example:

import pandas as pd
from io import StringIO

df1 = pd.read_csv(StringIO(
"""
0     A      21    rose
1     B      22    apple
2     C      30    grapes
4     D      21    lily
"""), sep='\s+', header=None)

df2 = pd.read_csv(StringIO(
"""
orange      1        0
apple       1        0       
rose        0        1
lily        0        1
grapes      1        0
"""), sep='\s+', header=None)

df1.columns = ['id', 'name', 'age', 'likes']
df2.columns = ['category', 'Fruit', 'Flower']

df1['category'] = df2.apply(lambda x: 'Fruit' if x.Fruit else 'Flower', 1)

Input

   id name  age   likes
0   0    A   21    rose
1   1    B   22   apple
2   2    C   30  grapes
3   4    D   21    lily

Output

   id name  age   likes category
0   0    A   21    rose    Fruit
1   1    B   22   apple    Fruit
2   2    C   30  grapes   Flower
3   4    D   21    lily   Flower



回答3:


the trick lies in the fact that the two tables have different number of rows, also the examples above might not work if df2 has more categories than what is in df1.

here's a working example:

df1 = pd.DataFrame([['orange',12],['rose',3],['apple',44],['grapes',1]], columns = ['name', 'age'])


df1
    name    age
0   orange  12
1   rose    3
2   apple   44
3   grapes  1
df2 = pd.DataFrame([['orange',1],['rose',0],['apple',1],['grapes',1],['daffodils',0],['berries',1]], columns = ['cat', 'Fruit'])

df2
    cat         Fruit
0   orange      1
1   rose        0
2   apple       1
3   grapes      1
4   daffodils   0
5   berries     1

one single line, run a listcomp with a conditional statement and do the merged df1 and df2 on the fly where the key df1.name = df2.cat:

df1['flag'] = ['Fruit' if i == 1 else 'Flower' for i in df1.merge(df2,how='left',left_on='name', right_on='cat').Fruit]
df1
output
name    age     flag
0   orange  12  Fruit
1   rose    3   Flower
2   apple   44  Fruit
3   grapes  1   Fruit


来源:https://stackoverflow.com/questions/53078951/decode-one-hot-dataframe-in-pandas

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!