问题
I want to find the difference between 2 columns of type int in a pandas DataFrame. I am using python 2.7. The columns are as below -
>>> df
INVOICED_QUANTITY QUANTITY_SHIPPED
0 15 NaN
1 20 NaN
2 7 NaN
3 7 NaN
4 7 NaN
Now, I want to subtract QUANTITY_SHIPPED from INVOICED_QUANTITY & I do the below-
>>> df['Diff'] = df['QUANTITY_INVOICED'] - df['SHIPPED_QUANTITY']
>>> df
QUANTITY_INVOICED SHIPPED_QUANTITY Diff
0 15 NaN NaN
1 20 NaN NaN
2 7 NaN NaN
3 7 NaN NaN
4 7 NaN NaN
How do I take care of the NaN's? I would like to get the below as result as I want NaN's to be treated as 0 (zero)-
>>> df
QUANTITY_INVOICED SHIPPED_QUANTITY Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
I do not want to do a df.fillna(0)
. For sum I would try something like the following & it works but not for difference -
>>> df['Sum'] = df[['QUANTITY_INVOICED', 'SHIPPED_QUANTITY']].sum(axis=1)
>>> df
INVOICED_QUANTITY QUANTITY_SHIPPED Diff Sum
0 15 NaN NaN 15
1 20 NaN NaN 20
2 7 NaN NaN 7
3 7 NaN NaN 7
4 7 NaN NaN 7
回答1:
You can use the sub
method to perform the subtraction - this method allows NaN
values to be treated as a specified value:
df['Diff'] = df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
Which produces:
INVOICED_QUANTITY QUANTITY_SHIPPED Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
The other neat way to do this is as @JianxunLi suggests: fill in the missing values in the column (creating a copy of the column) and subtract as normal.
The two approaches are almost the same, although sub
is a little more efficient because it doesn't need to produce a copy of the column in advance; it just fills the missing values "on the fly":
In [46]: %timeit df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
10000 loops, best of 3: 144 µs per loop
In [47]: %timeit df['INVOICED_QUANTITY'].sub(df['QUANTITY_SHIPPED'], fill_value=0)
10000 loops, best of 3: 81.7 µs per loop
回答2:
I think a simple fill NaN by 0 would help you out.
df['Diff'] = df['INVOICED_QUANTITY'] - df['QUANTITY_SHIPPED'].fillna(0)
Out[153]:
INVOICED_QUANTITY QUANTITY_SHIPPED Diff
0 15 NaN 15
1 20 NaN 20
2 7 NaN 7
3 7 NaN 7
4 7 NaN 7
来源:https://stackoverflow.com/questions/31053848/find-difference-between-2-columns-with-nulls-using-pandas