I have a dataframe:
a b c
0 nan Y nan
1 23 N 3
2 nan N 2
3 44 Y nan
I wish to have this output:
Since you just want Nan
s to be propagated, multiplying the columns takes care of that for you:
>>> df = pd.read_clipboard()
>>> df
a b c
0 NaN Y NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> df.a * df.c
0 NaN
1 69.0
2 NaN
3 NaN
dtype: float64
>>>
If you want to do it on a condition, you can use np.where
here instead of .apply
. all you need is the following:
>>> df
a b c
0 NaN Y NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
This is the default behavior for most operations involving Nan
. So, you can simply assign the result of the above:
>>> df['d'] = np.where(df.b == 'N', df.a*df.c, df.a)
>>> df
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
>>>
Just to elaborate on what this:
np.where(df.b == 'N', df.a*df.c, df.a)
Is doing, you can think of it as "where df.b == 'N', give me the result of df.a * df.c
, else, give me just df.a
:
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
Also note, if your dataframe were a little different:
>>> df
a b c
0 NaN Y NaN
1 23.0 Y 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> df.loc[0,'a'] = 99
>>> df.loc[0, 'b']= 'N'
>>> df
a b c
0 99.0 N NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
Then the following would not be equivalent:
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
>>> np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
array([ 99., 69., nan, 44.])
So you might want to use the slightly more verbose:
>>> df['d'] = np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
>>> df
a b c d
0 99.0 N NaN 99.0
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
>>>