问题
Thanks to anyone who have a look first.
My codes are :
import numpy as np
from scipy.stats import kstest
data=[31001, 38502, 40842, 40852, 43007, 47228, 48320, 50500, 54545, 57437, 60126, 65556, 71215, 78460, 81299, 96851, 106472, 108398, 118495, 130832, 141678, 155703, 180689, 218032, 222238, 239553, 250895, 274025, 298231, 330228, 330910, 352058, 362993, 369690, 382487, 397270, 414179, 454013, 504993, 518475, 531767, 551032, 782483, 913658, 1432195, 1712510, 2726323, 2777535, 3996759, 13608152]
x=np.array(data)
test_sta=kstest(x, 'norm')
print(test_sta)
The result of kstest is KstestResult(statistic=1.0, pvalue=0.0). Is there anything wrong with the code or the data is just not normal at all?
回答1:
I've not used this before, but I think you're testing whether your data is standard-normal (i.e. mean=0, variance=1)
plotting a histogram shows it to be much closer to a log-normal. I'd therefore do:
x = np.log(data)
x -= np.mean(x)
x /= np.std(x)
kstest(x, 'norm')
which gives me a test statistic of 0.095 and a p-value of 0.75, confirming that we can't reject that it's not log-normal.
a good way to check this sort of thing is to generate some random data (from a known distribution) and see what the test gives you back. for example:
kstest(np.random.normal(size=100), 'norm')
gives me p-values near 1, while:
kstest(np.random.normal(loc=13, size=100), 'norm')
gives me p-values near 0.
a log-normal distribution just means that it's normally distributed after log transforming. if you really want to test against a normal distribution, you'd just not log transform the data, e.g:
x = np.array(data, dtype=float)
x -= np.mean(x)
x /= np.std(x)
kstest(x, 'norm')
which gives me a p-value of 7e-7, indicating that we can reliably reject the hypothesis that it's normally distributed.
来源:https://stackoverflow.com/questions/59022661/why-did-my-p-value-equals-0-and-statistic-equals-1-when-i-use-ks-test-in-python