问题
I have some code and do not understand why the difference occurs:
np.std() which default ddof=0,when it's used alone.
but why when it's used as an argument in pivot_table(aggfunc=np.std),it changes into ddof=1 automatically.
import numpys as np
import pandas as pd
dft = pd.DataFrame({'A': ['one', 'one'],
'B': ['A', 'A'],
'C': ['bar', 'bar'],
'D': [-0.866740402,1.490732028]})
np.std(dft['D'])
#equivalent:np.std([-0.866740402,1.490732028]) (which:defaualt ddof=0)
#the result: 1.178736215
dft.pivot_table(index=['A', 'B'],columns='C',aggfunc=np.std)
#equivalent:np.std([-0.866740402,1.490732028],ddof=1)
#the result:1.666985
回答1:
pivot uses DataFrame.groupby.agg and when you supply an aggregation function it's going to try to figure out exactly how to _aggregate.
arg=np.std will get handled here, the relevant code being
f = self._get_cython_func(arg)
if f and not args and not kwargs:
return getattr(self, f)(), None
Hidden in the DataFrame class is this table:
pd.DataFrame()._cython_table
#OrderedDict([(<function sum>, 'sum'),
# (<function max>, 'max'),
# ...
# (<function numpy.std>, 'std'),
# (<function numpy.nancumsum>, 'cumsum')])
pd.DataFrame()._cython_table.get(np.std)
#'std'
And so np.std
is only used to select the attribute to call, the default ddof
are completely ignored, and instead the pandas
default of ddof=1
is used.
getattr(dft['D'], 'std')()
#1.6669847417133286
来源:https://stackoverflow.com/questions/60647377/why-np-std-and-pivot-tableaggfunc-np-std-return-the-different-result