问题
Suppose I have two dataframes - conditions and data.
import pandas as pd
conditions = pd.DataFrame({'class': [1,2,3,4,4,5,5,4,4,5,5,5],
'primary_lower': [0,0,0,160,160,160,160,160,160,160,160,800],
'primary_upper':[9999,9999,9999,480,480,480,480,480,480,480,480,4000],
'secondary_lower':[0,0,0,3500,6100,3500,6100,0,4800,0,4800,10],
'secondary_upper':[9999,9999,9999,4700,9999,4700,9999,4699,6000,4699,6000,3000],
'group':['A','A','A','B','B','B','B','C','C','C','C','C']})
data = pd.DataFrame({'class':[1,1,4,4,5,5,2],
'primary':[2000,9100,1100,170,300,210,1000],
'secondary':[1232,3400,2400,380,3600,4800,8600]})
I'd like to generate a new column (group) in the "data" table that assigns a group to each row given the conditions provided in the "conditions" table.
The conditions table is structured so that rows within each group are joined by "OR"s and columns are joined by "AND"s. For example, to be assigned group "B":
(class = 4 AND 160<=primary<=480 AND 3500<=secondary<=4700)
OR
(class = 4 AND 160<=primary<=480 AND 6100<=secondary<=9999)
OR
(class = 5 AND 160<=primary<=480 AND 3500<=secondary<=4700)
OR
(class = 5 AND 160<=primary<=480 AND 6100<=secondary<=9999)
Any rows that don't match any of the conditions will get assigned group "Other". So, the final dataframe would then look like this:
+-------+---------+-----------+-------+
| class | primary | secondary | group |
+-------+---------+-----------+-------+
| 1 | 2000 | 1232 | A |
| 1 | 9100 | 3400 | A |
| 4 | 1100 | 2400 | Other |
| 4 | 170 | 380 | C |
| 5 | 300 | 3600 | B |
| 5 | 210 | 4800 | C |
| 2 | 1000 | 8600 | A |
+-------+---------+-----------+-------+
回答1:
You can iterate a GroupBy
object and take the union of the masks within each group:
for key, grp in conditions.groupby('group'):
cols = ['class', 'primary_lower', 'primary_upper',
'secondary_lower', 'secondary_upper']
masks = (data['class'].eq(cls) & \
data['primary'].between(prim_lower, prim_upper) & \
data['secondary'].between(sec_lower, sec_upper) \
for cls, prim_lower, prim_upper, sec_lower, sec_upper in \
grp[cols].itertuples(index=False))
data.loc[pd.concat(masks, axis=1).any(1), 'group'] = key
data['group'] = data['group'].fillna('Other')
Result:
print(data)
class primary secondary group
0 1 2000 1232 A
1 1 9100 3400 A
2 4 1100 2400 Other
3 4 170 380 C
4 5 300 3600 C
5 5 210 4800 C
6 2 1000 8600 A
Note index=4
has a different result to your desired output since there are multiple conditions which satisfy the data.
来源:https://stackoverflow.com/questions/52805218/create-column-based-on-multiple-column-conditions-from-another-dataframe