P-value from Chi sq test statistic in Python

匿名 (未验证) 提交于 2019-12-03 02:45:02

问题:

I have computed a test statistic that is distributed as a chi square with 1 degree of freedom, and want to find out what P-value this corresponds to using python.

I'm a python and maths/stats newbie so I think what I want here is the probability denisty function for the chi2 distribution from SciPy. However, when I use this like so:

from scipy import stats stats.chi2.pdf(3.84 , 1) 0.029846 

However some googling and talking to some colleagues who know maths but not python have said it should be 0.05.

Any ideas? Cheers, Davy

回答1:

Quick refresher here:

Probability Density Function: think of it as a point value; how dense is the probability at a given point?

Cumulative Distribution Function: this is the mass of probability of the function up to a given point; what percentage of the distribution lies on one side of this point?

In your case, you took the PDF, for which you got the correct answer. If you try 1 - CDF:

>>> 1 - stats.chi2.cdf(3.84, 1) 0.050043521248705147 

PDF CDF



回答2:

To calculate probability of null hypothesis given chisquared sum, and degrees of freedom you can also call chisqprob:

>>> from scipy.stats import chisqprob >>> chisqprob(3.84, 1) 0.050043521248705189 

Notice:

chisqprob is deprecated! stats.chisqprob is deprecated in scipy 0.17.0; use stats.distributions.chi2.sf instead



回答3:

While stats.chisqprob() and 1-stats.chi2.cdf() appear comparable for small chi-square values, for large chi-square values the former is preferable. The latter cannot provide a p-value smaller than machine epsilon,and will give very inaccurate answers close to machine epsilon. As shown by others, comparable values result for small chi-squared values with the two methods:

>>>from scipy.stats import chisqprob, chi2 >>>chisqprob(3.84,1) 0.050043521248705189 >>>1 - chi2.cdf(3.84,1) 0.050043521248705147 

Using 1-chi2.cdf() breaks down here:

>>>1 - chi2.cdf(67,1) 2.2204460492503131e-16 >>>1 - chi2.cdf(68,1) 1.1102230246251565e-16 >>>1 - chi2.cdf(69,1) 1.1102230246251565e-16 >>>1 - chi2.cdf(70,1) 0.0 

Whereas chisqprob() gives you accurate results for a much larger range of chi-square values, producing p-values nearly as small as the smallest float greater than zero, until it too underflows:

>>>chisqprob(67,1) 2.7150713219425247e-16 >>>chisqprob(68,1) 1.6349553217245471e-16 >>>chisqprob(69,1) 9.8463440314253303e-17     >>>chisqprob(70,1) 5.9304458500824782e-17 >>>chisqprob(500,1) 9.505397766554137e-111 >>>chisqprob(1000,1) 1.7958327848007363e-219 >>>chisqprob(1424,1) 1.2799986253099803e-311 >>>chisqprob(1425,1) 0.0 

Update: as noted, chisqprob() is deprecated for scipy version 0.17.0 onwards. High accuracy chi-square values can now be obtained via scipy.stats.distributions.chi2.sf(), for example:

>>>from scipy.stats.distributions import chi2 >>>chi2.sf(3.84,1) 0.050043521248705189 >>>chi2.sf(1424,1) 1.2799986253099803e-311 


回答4:

You meant to do:

>>> 1 - stats.chi2.cdf(3.84, 1) 0.050043521248705147 


回答5:

Some of the other solutions are deprecated. Use scipy.stats.chi2 Survival Function. Which is the same as 1 - cdf(chi_statistic, df)

Example:

from scipy.stats import chi2 p_value = chi2.sf(chi_statistic, df) 


回答6:

If you want to understand the math, the p-value of a sample, x (fixed), is

P[P(X) <= P(x)] = P[m(X) >= m(x)] = 1 - G(m(x)^2)

where,

  • P is the probability of a (say k-variate) normal distribution w/ known covariance (cov) and mean,
  • X is a random variable from that normal distribution,
  • m(x) is the mahalanobis distance = sqrt( < cov^{-1} (x-mean), x-mean >. Note that in 1-d this is just the absolute value of the z-score.
  • G is the CDF of the chi^2 distribution w/ k degrees of freedom.

So if you're computing the p-value of a fixed observation, x, then you compute m(x) (generalized z-score), and 1-G(m(x)^2).

for example, it's well known that if x is sampled from a univariate (k = 1) normal distribution and has z-score = 2 (it's 2 standard deviations from the mean), then the p-value is about .046 (see a z-score table)

In [7]: from scipy.stats import chi2  In [8]: k = 1  In [9]: z = 2  In [10]: 1-chi2.cdf(z**2, k) Out[10]: 0.045500263896358528 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!