I am trying to use groupby and np.std to calculate a standard deviation, but it seems to be calculating a sample standard deviation (with a degrees of freedom equal to 1).
For degree of freedom = 0
(This means that bins with one number will end up with std=0
instead of NaN
)
import numpy as np
def std(x):
return np.std(x)
df.groupby('A').agg(['mean', 'max', std])
You can pass additional args to np.std
in the agg
function:
In [202]:
df.groupby('A').agg(np.std, ddof=0)
Out[202]:
B values
A
1 0.5 2.5
2 0.5 2.5
In [203]:
df.groupby('A').agg(np.std, ddof=1)
Out[203]:
B values
A
1 0.707107 3.535534
2 0.707107 3.535534