Need help with Pandas multiple IF-ELSE statements. I have a test dataset (titanic) as follows:
ID Survived Pclass Name Sex Age
1 0 3 Braund male
Instead of looping through the rows using df.iterrows
(which is relatively slow), you can assign the desired values to the Prediction
column in one assignment:
In [27]: df['Prediction'] = ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18))).astype('int')
In [29]: df['Prediction']
Out[29]:
0 0
1 1
2 1
3 1
4 0
5 0
6 0
7 0
Name: Prediction, dtype: int32
For your first approach, remember that df['Prediction']
represents an entire column of df
, so df['Prediction']=1
assigns the value 1 to each row in that column. Since df['Prediction']=0
was the last assignment, the entire column ended up being filled with zeros.
For your second approach, note that you need to use &
not and
to perform an elementwise logical-and operation on two NumPy arrays or Pandas NDFrames. Thus, you could use
In [32]: np.where(df['Sex']=='female', 1, np.where((df['Pclass']==1)&(df['Age']<18), 1, 0))
Out[32]: array([0, 1, 1, 1, 0, 0, 0, 0])
though I think it is much simpler to just use |
for logical-or and &
for logical-and:
In [34]: ((df['Sex']=='female') | ((df['Pclass']==1) & (df['Age']<18)))
Out[34]:
0 False
1 True
2 True
3 True
4 False
5 False
6 False
7 False
dtype: bool