Estimate confidence intervals for parameters of distribution in python

问题

Is there a built in function that will provide the confidence intervals for parameter estimates in a python package or is this something I will need to implement by hand? I am looking for something similar to matlabs gevfit http://www.mathworks.com/help/stats/gevfit.html.

回答1:

Take a look at scipy and numpy in case you haven't already. If you have some familiarity with MATLAB, then the switch should be relatively easy. I've taken this quick snippet from this SO response:

import numpy as np
import scipy as sp
import scipy.stats

def mean_confidence_interval(data, confidence=0.95):
    a = 1.0*np.array(data)
    n = len(a)
    m, se = np.mean(a), scipy.stats.sem(a)
    h = se * sp.stats.t.ppf((1+confidence)/2., n-1)
    return m, m-h, m+h

You should be able to customize the returns to your liking. Like the MATLAB gevfit function, it defaults to using 95% confidence bounds.

回答2:

The bootstrap can be used to estimate confidence intervals of any function (np.mean, st.genextreme.fit, etc.) of a sample, and there is a Python library: scikits.bootstrap.

Here for the data from the question author's related question:

import numpy as np, scipy.stats as st, scikits.bootstrap as boot
data = np.array([ 22.20379411,  22.99151292,  24.27032696,  24.82180626,
  25.23163221,  25.39987272,  25.54514567,  28.56710007,
  29.7575898 ,  30.15641696,  30.79168255,  30.88147532,
  31.0236419 ,  31.17380647,  31.61932755,  32.23452568,
  32.76262978,  33.39430032,  33.81080069,  33.90625861,
  33.99142006,  35.45748368,  37.0342621 ,  37.14768791,
  38.14350221,  42.72699534,  44.16449992,  48.77736737,
  49.80441736,  50.57488779])

st.genextreme.fit(data)   # just to check the parameters
boot.ci(data, st.genextreme.fit)

The results are

(-0.014387281261850815, 29.762126238637851, 5.8983127779873605)
array([[ -0.40002507,  26.93511496,   4.6677834 ],
       [  0.19743722,  32.41834882,   9.05026202]])

The bootstrap takes about three minutes on my machine; by default, boot.ci uses 10,000 bootstrap iterations (n_samples), see code or help(boot.ci), and st.genextreme.fit is not superfast.

The confidence intervals from boot.ci do not match the ones from MATLAB's gevfit exactly. E.g., MATLAB gives a symmetric one [-0.3032, 0.3320] for the first parameter (0.0144).

来源：https://stackoverflow.com/questions/31481279/estimate-confidence-intervals-for-parameters-of-distribution-in-python

标签

python

matlab

scipy