问题
I have a dataframe:
Prop_ID Unit_ID Prop_Usage Unit_Usage
1 1 RESIDENTIAL RESIDENTIAL
1 2 RESIDENTIAL COMMERCIAL
1 3 RESIDENTIAL INDUSTRIAL
1 4 RESIDENTIAL RESIDENTIAL
2 1 COMMERCIAL RESIDENTIAL
2 2 COMMERCIAL COMMERCIAL
2 3 COMMERCIAL COMMERCIAL
3 1 INDUSTRIAL INDUSTRIAL
3 2 INDUSTRIAL COMMERCIAL
4 1 RESIDENTIAL - COMMERCIAL RESIDENTIAL
4 2 RESIDENTIAL - COMMERCIAL COMMERCIAL
4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL
5 1 COMMERCIAL / RESIDENTIAL RESIDENTIAL
5 2 COMMERCIAL / RESIDENTIAL COMMERCIAL
5 3 COMMERCIAL / RESIDENTIAL INDUSTRIAL
5 4 COMMERCIAL / RESIDENTIAL COMMERCIAL
One property may have more than 1 unit. That means units are the subcategory of properties. I want to filter rows where Prop_Usage
does not match with Unit_Usage
. We have a category in Prop_Usage
column that's RESIDENTIAL - COMMERCIAL
then Unit_Usage
can be either RESIDENTIAL
or COMMERCIAL
. Similarly for COMMERCIAL / RESIDENTIAL
.
Expected Output:
Prop_ID Unit_ID Prop_Usage Unit_Usage
1 2 RESIDENTIAL COMMERCIAL
1 3 RESIDENTIAL INDUSTRIAL
2 1 COMMERCIAL RESIDENTIAL
3 2 INDUSTRIAL COMMERCIAL
4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL
5 3 COMMERCIAL / RESIDENTIAL INDUSTRIAL
回答1:
Use in
statement in DataFrame.apply:
df = df[~df.apply(lambda x: x['Unit_Usage'] in x['Prop_Usage'], axis=1)]
Or use zip
in list comprehension:
df = df[[not a in b for a, b in zip(df['Unit_Usage'], df['Prop_Usage'])]]
print (df)
Prop_ID Unit_ID Prop_Usage Unit_Usage
1 1 2 RESIDENTIAL COMMERCIAL
2 1 3 RESIDENTIAL INDUSTRIAL
4 2 1 COMMERCIAL RESIDENTIAL
8 3 2 INDUSTRIAL COMMERCIAL
11 4 3 RESIDENTIAL - COMMERCIAL INDUSTRIAL
14 5 3 COMMERCIAL / RESIDENTIAL INDUSTRIAL
来源:https://stackoverflow.com/questions/59692738/how-to-filter-dataframe-by-splitting-categories-of-a-columns-into-sets