Fittiing For Discrete Data: Negative Binomial, Poisson, Geometric Distribution

问题

In scipy there is no support for fitting discrete distributions using data. I know there are a lot of subject about this.

For example if i have an array like below:

x = [2,3,4,5,6,7,0,1,1,0,1,8,10,9,1,1,1,0,0]

I couldn' t apply for this array;

from scipy.stats import nbinom
param = nbinom.fit(x)

But i would like to ask you up to date, is there any way to fit for these three discrete distributions and then choose the best fit for the discrete dataset?

回答1:

You can use Method of Moments to fit any particular distribution.

Basic idea: get empirical first, second, etc. moments, then derive distribution parameters from these moments.

So, in all these cases we only need two moments. Let's get them:

import pandas as pd
# for other distributions, you'll need to implement PMF
from scipy.stats import nbinom, poisson, geom

x = pd.Series(x)
mean = x.mean()
var = x.var()
likelihoods = {}  # we'll use it later

Note: I used pandas instead of numpy. That is because numpy's var() and std() don't apply Bessel's correction, while pandas' do. If you have 100+ samples, there shouldn't be much difference, but on smaller samples it could be important.

Now, let's get parameters for these distributions. Negative binomial has two parameters: p, r. Let's estimate them and calculate likelihood of the dataset:

# From the wikipedia page, we have:
# mean = pr / (1-p)
# var = pr / (1-p)**2
# without wiki, you could use MGF to get moments; too long to explain here
# Solving for p and r, we get:

p = 1 - mean / var  # TODO: check for zero variance and limit p by [0, 1]
r = (1-p) * mean / p

UPD: Wikipedia and scipy are using different definitions of p, one treating it as probability of success and another as probability of failure. So, to be consistent with scipy notion, use:

p = mean / var
r = p * mean / (1-p)

END OF UPD

Calculate likelihood:

likelihoods['nbinom'] = x.map(lambda val: nbinom.pmf(val, r, p)).prod()

Same for Poisson, there is only one parameter:

# from Wikipedia,
# mean = variance = lambda. Nothing to solve here
lambda_ = mean
likelihoods['poisson'] = x.map(lambda val: poisson.pmf(val, lambda_)).prod()

Same for Geometric distribution:

# mean = 1 / p  # this form fits the scipy definition
p = 1 / mean

likelihoods['geometric'] = x.map(lambda val: geom.pmf(val, p)).prod()

Finally, let's get the best fit:

best_fit = max(likelihoods, key=lambda x: likelihoods[x])
print("Best fit:", best_fit)
print("Likelihood:", likelihoods[best_fit])

Let me know if you have any questions

来源：https://stackoverflow.com/questions/59308441/fittiing-for-discrete-data-negative-binomial-poisson-geometric-distribution

标签

python

scipy

statistics

distribution