问题

I was wondering how to calculate skewness and kurtosis correctly in pandas. Pandas gives some values for skew() and kurtosis() values but they seem much different from scipy.stats values. Which one to trust pandas or scipy.stats?

Here is my code:

import numpy as np
import scipy.stats as stats
import pandas as pd

np.random.seed(100)
x = np.random.normal(size=(20))

kurtosis_scipy = stats.kurtosis(x)
kurtosis_pandas = pd.DataFrame(x).kurtosis()[0]

print(kurtosis_scipy, kurtosis_pandas)
# -0.5270409758168872
# -0.31467107631025604

skew_scipy = stats.skew(x)
skew_pandas = pd.DataFrame(x).skew()[0]

print(skew_scipy, skew_pandas)
# -0.41070929017558555
# -0.44478877631598901

Versions:

print(np.__version__, pd.__version__, scipy.__version__)
1.11.0 0.20.0 0.19.0

回答1:

bias=False

print(
    stats.kurtosis(x, bias=False), pd.DataFrame(x).kurtosis()[0],
    stats.skew(x, bias=False), pd.DataFrame(x).skew()[0],
    sep='\n'
)

-0.31467107631025515
-0.31467107631025604
-0.4447887763159889
-0.444788776315989

回答2:

Pandas calculate UNBIASED estimator of the population kurtosis. Look at the Wikipedia for formulas: https://www.wikiwand.com/en/Kurtosis

Calculate kurtosis from scratch

import numpy as np
import pandas as pd
import scipy

x = np.array([0, 3, 4, 1, 2, 3, 0, 2, 1, 3, 2, 0,
              2, 2, 3, 2, 5, 2, 3, 999])
k2 = x.var(ddof=1) # default numpy is biased, ddof = 0
sum_term = ((x-xbar)**4).sum()
factor = (n+1) * n / (n-1) / (n-2) / (n-3)
second = - 3 * (n-1) * (n-1) / (n-2) / (n-3)

first = factor * sum_term / k2 / k2

G2 = first + second
G2 # 19.998428728659768

Calculate kurtosis using numpy/scipy

scipy.stats.kurtosis(x,bias=False) # 19.998428728659757

Calculate kurtosis using pandas

pd.DataFrame(x).kurtosis() # 19.998429

Similarly, you can also calculate skewness.

来源：https://stackoverflow.com/questions/56758125/how-to-find-skewness-and-kurtosis-correctly-in-pandas

标签

python

pandas

scipy

How to find skewness and kurtosis correctly in pandas?