问题
The data I'm using is pasted below. When I apply the basic formula for skewness to my data in R:
3*(mean(data) - median(data))/sd(data)
The result is -0.07949198. I get a very similar result in Python. The median is therefore greater than the mean suggesting the left tail is longer.
However, when I apply the descdist function from the fitdistrplus package, the skewness is 0.3076471 suggesting the right tail is longer. The Scipy function skew again returns a skewness of 0.303.
Can I trust this simple formula which gives me a negative skewness? What is going on here.
Thanks, Oliver
data = c(0.18941565600882029, 1.9861271676300578, -5.2022598870056491, 1.6826411075612353, 1.6826411075612353, -2.9502890173410403, -2.923253150057274, -2.9778296382730454, 0.71202396234488663, 0.71202396234488663, -3.1281373844121529, 1.8326831382748159, -5.2961554710604135, 2.7793190416141234, 0.46922759190417185, 7.0730158730158728, 1.1745152354570636, 2.8142292490118579, 2.037940379403794, 7.0607489597780866, 10.460258249641321, 11.894978479196554, 4.8334682860998655, 1.3884016973125886, 4.0940458015267174, 0.12592959841348539, -0.37022332506203476, 1.9713554987212274, -0.83774145616641893, -1.896978417266187, 6.4340675477239362, -6.4774193548387089, -0.31790393013100438, -4.4193265007320646, 5.7454545454545451, 2.5913432835820895, 0.86190724335591451, 0.95753781950965045, 6.8923556942277697, 1.7650659630606862, -2.4558421851289833, -2.390546528803545, 2.6355029585798815, 0.26983655274888557, 1.5032159264931086, 3.9839506172839503, -5.1404511278195484, -2.2477777777777779, 6.0604444444444443, -0.9691172451489477, 1.1383462670591382, -1.5281319661168078, 4.7775667118950702, 1.2223175965665234, 2.0563555555555553, -3.6153201970443352, -0.35731206188058978, -3.6265094676670238, 1.3053804930332262, -4.4604960677555958, -0.8933514246947083, 0.7622542595019659, 1.3892170651664322, 2.5725258493353031, -0.028006088280060883, 0.8933947772657449, 2.4907086614173228, 3.0914196567862717, 4.4222575516693157, 0.64568527918781726, 0.97095158597662778, -3.7409780775716697, -3.3472636815920396, -0.66307448494453247, -7.0384291725105186, -0.14540612516644474, -0.38161535029004906, 5.1076923076923082, 4.0237516869095806, 1.510099573257468, 1.5064083457526081, -0.025879043600562587, 4.5001414427156998, 3.2326264274061991, 1.0185639229422065, 2.66690518783542, 0.53032015065913374, 1.2117829457364342, 0.60861244019138749, -2.5248049921996878, 1.8666666666666669, -0.32978612415232139, 0.29055999999999998, 1.9150729335494328, 2.2988352745424296, 3.779225265235628, 0.093884800811976657, 1.0097869890616005, 1.2220632081097198, 0.21164401128494487)
回答1:
I don't have access to the packages you mention right now so I can't check which formula they apply, however, you seem to be using Pearson's second skewness coefficient (see wikipedia). The estimator for the sample skewness is given on the same page and is given by the third moment which can be calculated simply by:
> S <- mean((data-mean(data))^3)/sd(data)^3
> S
[1] 0.2984792
> n <- length(data)
> S_alt <- S*n^2/((n-1)*(n-2))
> S_alt
[1] 0.3076471
See the alternative definition on the wiki page which yields the same results as in your example.
回答2:
The skewness is generally defined as the third central moment (at least when it is being used by statisticians.) The Wikipedia skewness page explains why the definition you found is unreliable. (I had never seen that definition.) The code in descdist
is easy to review:
moment <- function(data, k) {
m1 <- mean(data) # so this is a "central moment"
return(sum((data - m1)^k)/length(data))
}
skewness <- function(data) {
sd <- sqrt(moment(data, 2))
return(moment(data, 3)/sd^3)}
skewness(data)
#[1] 0.3030131
The version you use is apparently called 'median skewness' or 'non-parametric skewness'. See: https://stats.stackexchange.com/questions/159098/taming-of-the-skew-why-are-there-so-many-skew-functions
来源:https://stackoverflow.com/questions/38782203/inconsistent-skewness-results-between-basic-skewness-formula-python-and-r