问题
I have a data frame with:
customer_id [1,2,3,4,5,6,7,8,9,10]
feature1 [0,0,1,1,0,0,1,1,0,0]
feature2 [1,0,1,0,1,0,1,0,1,0]
feature3 [0,0,1,0,0,0,1,0,0,0]
Using this I want to create a new variable (say new_var) to say when feature 1 is 1 then the new_var=1, if feature_2=1 then new_var=2, feature3=1 then new_var=3 else 4. I was trying np.where but though it doesn't give me an error, it doesn't do the right thing - so I guess a nested np.where works on a single variable only. In which case, what's the best way to perform a nested if/case when in pandas?
My np.where code was something like this:
df[new_var]=np.where(df['feature1']==1,'1', np.where(df['feature2']==1,'2', np.where(df[feature3']==1,'3','4')))
回答1:
I think you need numpy.select - it select first True
values and all another are not important:
m1 = df['feature1']==1
m2 = df['feature2']==1
m3 = df['feature3']==1
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
Sample:
customer_id = [1,2,3,4,5,6,7,8,9,10]
feature1 = [0,0,1,1,0,0,1,1,0,0]
feature2 = [1,0,1,0,1,0,1,0,1,0]
feature3 = [0,0,1,0,0,0,1,0,0,0]
df = pd.DataFrame({'customer_id':customer_id,
'feature1':feature1,
'feature2':feature2,
'feature3':feature3})
m1 = df['feature1']==1
m2 = df['feature2']==1
m3 = df['feature3']==1
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
customer_id feature1 feature2 feature3 new_var
0 1 0 1 0 2
1 2 0 0 0 4
2 3 1 1 1 1
3 4 1 0 0 1
4 5 0 1 0 2
5 6 0 0 0 4
6 7 1 1 1 1
7 8 1 0 0 1
8 9 0 1 0 2
9 10 0 0 0 4
If in features
only 1
and 0
is possible convert 0
to False
and 1
to True
:
m1 = df['feature1'].astype(bool)
m2 = df['feature2'].astype(bool)
m3 = df['feature3'].astype(bool)
df['new_var'] = np.select([m1, m2, m3], ['1', '2', '3'], default='4')
print (df)
customer_id feature1 feature2 feature3 new_var
0 1 0 1 0 2
1 2 0 0 0 4
2 3 1 1 1 1
3 4 1 0 0 1
4 5 0 1 0 2
5 6 0 0 0 4
6 7 1 1 1 1
7 8 1 0 0 1
8 9 0 1 0 2
9 10 0 0 0 4
来源:https://stackoverflow.com/questions/45609903/np-where-on-multiple-variables