How to filter dataframe by splitting categories of a columns into sets?

你离开我真会死。 提交于 2021-02-17 02:06:10

问题


I have a dataframe:

Prop_ID    Unit_ID      Prop_Usage                     Unit_Usage
1          1            RESIDENTIAL                    RESIDENTIAL
1          2            RESIDENTIAL                    COMMERCIAL
1          3            RESIDENTIAL                    INDUSTRIAL
1          4            RESIDENTIAL                    RESIDENTIAL
2          1            COMMERCIAL                     RESIDENTIAL
2          2            COMMERCIAL                     COMMERCIAL
2          3            COMMERCIAL                     COMMERCIAL
3          1            INDUSTRIAL                     INDUSTRIAL
3          2            INDUSTRIAL                     COMMERCIAL
4          1            RESIDENTIAL - COMMERCIAL       RESIDENTIAL
4          2            RESIDENTIAL - COMMERCIAL       COMMERCIAL
4          3            RESIDENTIAL - COMMERCIAL       INDUSTRIAL
5          1            COMMERCIAL / RESIDENTIAL       RESIDENTIAL
5          2            COMMERCIAL / RESIDENTIAL       COMMERCIAL
5          3            COMMERCIAL / RESIDENTIAL       INDUSTRIAL
5          4            COMMERCIAL / RESIDENTIAL       COMMERCIAL

One property may have more than 1 unit. That means units are the subcategory of properties. I want to filter rows where Prop_Usage does not match with Unit_Usage. We have a category in Prop_Usage column that's RESIDENTIAL - COMMERCIAL then Unit_Usage can be either RESIDENTIAL or COMMERCIAL. Similarly for COMMERCIAL / RESIDENTIAL.

Expected Output:

Prop_ID    Unit_ID      Prop_Usage                   Unit_Usage
1          2            RESIDENTIAL                  COMMERCIAL
1          3            RESIDENTIAL                  INDUSTRIAL
2          1            COMMERCIAL                   RESIDENTIAL
3          2            INDUSTRIAL                   COMMERCIAL
4          3            RESIDENTIAL - COMMERCIAL     INDUSTRIAL
5          3            COMMERCIAL / RESIDENTIAL     INDUSTRIAL

回答1:


Use in statement in DataFrame.apply:

df = df[~df.apply(lambda x: x['Unit_Usage'] in x['Prop_Usage'], axis=1)]

Or use zip in list comprehension:

df = df[[not a in b for a, b in zip(df['Unit_Usage'], df['Prop_Usage'])]]

print (df)
    Prop_ID  Unit_ID                Prop_Usage   Unit_Usage
1         1        2               RESIDENTIAL   COMMERCIAL
2         1        3               RESIDENTIAL   INDUSTRIAL
4         2        1                COMMERCIAL  RESIDENTIAL
8         3        2                INDUSTRIAL   COMMERCIAL
11        4        3  RESIDENTIAL - COMMERCIAL   INDUSTRIAL
14        5        3  COMMERCIAL / RESIDENTIAL   INDUSTRIAL


来源:https://stackoverflow.com/questions/59692738/how-to-filter-dataframe-by-splitting-categories-of-a-columns-into-sets

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!