Fitting negative binomial in python

前端 未结 3 1181
北海茫月
北海茫月 2021-02-20 14:34

In scipy there is no support for fitting a negative binomial distribution using data (maybe due to the fact that the negative binomial in scipy is only discrete).

For a

3条回答
  •  走了就别回头了
    2021-02-20 14:48

    Not only because it is discrete, also because maximum likelihood fit to negative binomial can be quite involving, especially with an additional location parameter. That would be the reason why .fit() method is not provided for it (and other discrete distributions in Scipy), here is an example:

    In [163]:
    
    import scipy.stats as ss
    import scipy.optimize as so
    In [164]:
    #define a likelihood function
    def likelihood_f(P, x, neg=1):
        n=np.round(P[0]) #by definition, it should be an integer 
        p=P[1]
        loc=np.round(P[2])
        return neg*(np.log(ss.nbinom.pmf(x, n, p, loc))).sum()
    In [165]:
    #generate a random variable
    X=ss.nbinom.rvs(n=100, p=0.4, loc=0, size=1000)
    In [166]:
    #The likelihood
    likelihood_f([100,0.4,0], X)
    Out[166]:
    -4400.3696690513316
    In [167]:
    #A simple fit, the fit is not good and the parameter estimate is way off
    result=so.fmin(likelihood_f, [50, 1, 1], args=(X,-1), full_output=True, disp=False)
    P1=result[0]
    (result[1], result[0])
    Out[167]:
    (4418.599495886474, array([ 59.61196161,   0.28650831,   1.15141838]))
    In [168]:
    #Try a different set of start paramters, the fit is still not good and the parameter estimate is still way off
    result=so.fmin(likelihood_f, [50, 0.5, 0], args=(X,-1), full_output=True, disp=False)
    P1=result[0]
    (result[1], result[0])
    Out[168]:
    (4417.1495981801972,
     array([  6.24809397e+01,   2.91877405e-01,   6.63343536e-04]))
    In [169]:
    #In this case we need a loop to get it right
    result=[]
    for i in range(40, 120): #in fact (80, 120) should probably be enough
        _=so.fmin(likelihood_f, [i, 0.5, 0], args=(X,-1), full_output=True, disp=False)
        result.append((_[1], _[0]))
    In [170]:
    #get the MLE
    P2=sorted(result, key=lambda x: x[0])[0][1]
    sorted(result, key=lambda x: x[0])[0]
    Out[170]:
    (4399.780263084549,
     array([  9.37289361e+01,   3.84587087e-01,   3.36856705e-04]))
    In [171]:
    #Which one is visually better?
    plt.hist(X, bins=20, normed=True)
    plt.plot(range(260), ss.nbinom.pmf(range(260), np.round(P1[0]), P1[1], np.round(P1[2])), 'g-')
    plt.plot(range(260), ss.nbinom.pmf(range(260), np.round(P2[0]), P2[1], np.round(P2[2])), 'r-')
    Out[171]:
    []
    

    enter image description here

提交回复
热议问题