P-value from Chi sq test statistic in Python

浪子不回头ぞ 提交于 2019-12-03 04:11:52

问题


I have computed a test statistic that is distributed as a chi square with 1 degree of freedom, and want to find out what P-value this corresponds to using python.

I'm a python and maths/stats newbie so I think what I want here is the probability denisty function for the chi2 distribution from SciPy. However, when I use this like so:

from scipy import stats
stats.chi2.pdf(3.84 , 1)
0.029846

However some googling and talking to some colleagues who know maths but not python have said it should be 0.05.

Any ideas? Cheers, Davy


回答1:


Quick refresher here:

Probability Density Function: think of it as a point value; how dense is the probability at a given point?

Cumulative Distribution Function: this is the mass of probability of the function up to a given point; what percentage of the distribution lies on one side of this point?

In your case, you took the PDF, for which you got the correct answer. If you try 1 - CDF:

>>> 1 - stats.chi2.cdf(3.84, 1)
0.050043521248705147

PDF CDF




回答2:


To calculate probability of null hypothesis given chisquared sum, and degrees of freedom you can also call chisqprob:

>>> from scipy.stats import chisqprob
>>> chisqprob(3.84, 1)
0.050043521248705189

Notice:

chisqprob is deprecated! stats.chisqprob is deprecated in scipy 0.17.0; use stats.distributions.chi2.sf instead




回答3:


While stats.chisqprob() and 1-stats.chi2.cdf() appear comparable for small chi-square values, for large chi-square values the former is preferable. The latter cannot provide a p-value smaller than machine epsilon,and will give very inaccurate answers close to machine epsilon. As shown by others, comparable values result for small chi-squared values with the two methods:

>>>from scipy.stats import chisqprob, chi2
>>>chisqprob(3.84,1)
0.050043521248705189
>>>1 - chi2.cdf(3.84,1)
0.050043521248705147

Using 1-chi2.cdf() breaks down here:

>>>1 - chi2.cdf(67,1)
2.2204460492503131e-16
>>>1 - chi2.cdf(68,1)
1.1102230246251565e-16
>>>1 - chi2.cdf(69,1)
1.1102230246251565e-16
>>>1 - chi2.cdf(70,1)
0.0

Whereas chisqprob() gives you accurate results for a much larger range of chi-square values, producing p-values nearly as small as the smallest float greater than zero, until it too underflows:

>>>chisqprob(67,1)
2.7150713219425247e-16
>>>chisqprob(68,1)
1.6349553217245471e-16
>>>chisqprob(69,1)
9.8463440314253303e-17    
>>>chisqprob(70,1)
5.9304458500824782e-17
>>>chisqprob(500,1)
9.505397766554137e-111
>>>chisqprob(1000,1)
1.7958327848007363e-219
>>>chisqprob(1424,1)
1.2799986253099803e-311
>>>chisqprob(1425,1)
0.0

Update: as noted, chisqprob() is deprecated for scipy version 0.17.0 onwards. High accuracy chi-square values can now be obtained via scipy.stats.distributions.chi2.sf(), for example:

>>>from scipy.stats.distributions import chi2
>>>chi2.sf(3.84,1)
0.050043521248705189
>>>chi2.sf(1424,1)
1.2799986253099803e-311



回答4:


You meant to do:

>>> 1 - stats.chi2.cdf(3.84, 1)
0.050043521248705147



回答5:


Some of the other solutions are deprecated. Use scipy.stats.chi2 Survival Function. Which is the same as 1 - cdf(chi_statistic, df)

Example:

from scipy.stats import chi2
p_value = chi2.sf(chi_statistic, df)



回答6:


If you want to understand the math, the p-value of a sample, x (fixed), is

P[P(X) <= P(x)] = P[m(X) >= m(x)] = 1 - G(m(x)^2)

where,

  • P is the probability of a (say k-variate) normal distribution w/ known covariance (cov) and mean,
  • X is a random variable from that normal distribution,
  • m(x) is the mahalanobis distance = sqrt( < cov^{-1} (x-mean), x-mean >. Note that in 1-d this is just the absolute value of the z-score.
  • G is the CDF of the chi^2 distribution w/ k degrees of freedom.

So if you're computing the p-value of a fixed observation, x, then you compute m(x) (generalized z-score), and 1-G(m(x)^2).

for example, it's well known that if x is sampled from a univariate (k = 1) normal distribution and has z-score = 2 (it's 2 standard deviations from the mean), then the p-value is about .046 (see a z-score table)

In [7]: from scipy.stats import chi2

In [8]: k = 1

In [9]: z = 2

In [10]: 1-chi2.cdf(z**2, k)
Out[10]: 0.045500263896358528


来源:https://stackoverflow.com/questions/11725115/p-value-from-chi-sq-test-statistic-in-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!