I have a dataframe:
a b c
0 nan Y nan
1 23 N 3
2 nan N 2
3 44 Y nan
I wish to have this output:
You don't need apply
, use np.where
:
df['d'] = np.where(df.a.isnull(),
np.nan,
np.where((df.b == "N")&(~df.c.isnull()),
df.a*df.c,
df.a))
Output:
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
Use
pd.isnull(df['Description'][i])
You can try
df['d'] = np.where((df.b == 'N') & (pd.notnull(df.c)), df.a*df.c, np.where(pd.notnull(df.a), df.a, np.nan))
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
See the documentation for pandas notnull, in your current code, you just need to change series.notnull to pd.notnull(series) for it to work. Though np.where should be more efficient
def f4(row):
if row['a']==np.nan:
return np.nan
elif (row['b']=="N") & (pd.notnull(row.c)):
return row['a']*row['c']
else:
return row['a']
df['d']=df.apply(f4,axis=1)
Since you just want Nan
s to be propagated, multiplying the columns takes care of that for you:
>>> df = pd.read_clipboard()
>>> df
a b c
0 NaN Y NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> df.a * df.c
0 NaN
1 69.0
2 NaN
3 NaN
dtype: float64
>>>
If you want to do it on a condition, you can use np.where
here instead of .apply
. all you need is the following:
>>> df
a b c
0 NaN Y NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
This is the default behavior for most operations involving Nan
. So, you can simply assign the result of the above:
>>> df['d'] = np.where(df.b == 'N', df.a*df.c, df.a)
>>> df
a b c d
0 NaN Y NaN NaN
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
>>>
Just to elaborate on what this:
np.where(df.b == 'N', df.a*df.c, df.a)
Is doing, you can think of it as "where df.b == 'N', give me the result of df.a * df.c
, else, give me just df.a
:
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
Also note, if your dataframe were a little different:
>>> df
a b c
0 NaN Y NaN
1 23.0 Y 3.0
2 NaN N 2.0
3 44.0 Y NaN
>>> df.loc[0,'a'] = 99
>>> df.loc[0, 'b']= 'N'
>>> df
a b c
0 99.0 N NaN
1 23.0 N 3.0
2 NaN N 2.0
3 44.0 Y NaN
Then the following would not be equivalent:
>>> np.where(df.b == 'N', df.a*df.c, df.a)
array([ nan, 69., nan, 44.])
>>> np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
array([ 99., 69., nan, 44.])
So you might want to use the slightly more verbose:
>>> df['d'] = np.where((df.b == 'N') & (~df.c.isnull()), df.a*df.c, df.a)
>>> df
a b c d
0 99.0 N NaN 99.0
1 23.0 N 3.0 69.0
2 NaN N 2.0 NaN
3 44.0 Y NaN 44.0
>>>