scipy, lognormal distribution - parameters

匿名 (未验证) 提交于 2019-12-03 09:02:45

问题:

I want to fit lognormal distribution to my data, using python scipy.stats.lognormal.fit. According to the manual, fit returns shape, loc, scale parameters. But, lognormal distribution normally needs only two parameters: mean and standard deviation.

How to interpret the results from scipy fit function? How to get mean and std.dev.?

回答1:

The distributions in scipy are coded in a generic way wrt two parameter location and scale so that location is the parameter (loc) which shifts the distribution to the left or right, while scale is the parameter which compresses or stretches the distribution.

For the two parameter lognormal distribution, the "mean" and "std dev" correspond to log(scale) and shape (you can let loc=0).

The following illustrates how to fit a lognormal distribution to find the two parameters of interest:

In [56]: import numpy as np  In [57]: from scipy import stats  In [58]: logsample = stats.norm.rvs(loc=10, scale=3, size=1000) # logsample ~ N(mu=10, sigma=3)  In [59]: sample = np.exp(logsample) # sample ~ lognormal(10, 3)  In [60]: shape, loc, scale = stats.lognorm.fit(sample, floc=0) # hold location to 0 while fitting  In [61]: shape, loc, scale Out[61]: (2.9212650122639419, 0, 21318.029350592606)  In [62]: np.log(scale), shape  # mu, sigma Out[62]: (9.9673084420467362, 2.9212650122639419) 


回答2:

I just spend some time working this out and wanted to document it here: If you want to get the probability density (at point x) from the three return values of lognorm.fit (lets call them (shape, loc, scale)), you need to use this formula:

x = 1 / (shape*((x-loc)/scale)*sqrt(2*pi)) * exp(-1/2*(log((x-loc)/scale)/shape)**2) / scale 

So as an equation that is (loc is , shape is σ and scale is α):



回答3:

I think this will help. I was looking for the same issue for a long time and finally found a solution for my problem. In my case, I was trying to fit some data to the lognormal distribution using scipy.stats.lognorm module. However, when I finally got the model parameters, I could not find a way to replicate my results using the mean and std from y data.

In the code below, I explain from the mean and std parameters how to produce a normally distributed data sample using scipy.stats.norm module. Using those data, I fit the normal model (norm_dist_fitted) and also create a normal model using mean and standard deviation (mu, sigma) extracted from the data.

Original model producing the data, fitted and produced-by-(mu-sigma)-pair is compared in a graph.

Fig1


In the next section of the code, I use the normal data to produce a lognormal-distributed sample. To do so notice that the lognormal samples will be the exponential of the original sample. Hence, the mean and standard deviation of the exponential sample will be (exp(mu) and exp(sigma)).

I fitted the produced data to a lognormal (since the log of my sample (exp(x)) is normally distributed and follow the lognormal model assumptions.

To produce a lognormal model from the mean and standard deviation of your original data (x) the code will be:

lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu)) 

However, if your data is already in the exponential space (exp(x)), then you have to use:

muX = np.mean(np.log(x)) sigmaX = np.std(np.log(x)) scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX) 

Fig2

import scipy import matplotlib.pyplot as plt import seaborn as sns import numpy as np  mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example  sigma = 1.5 # Standard deviation of sample N = 2000 # Number of samples  norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process x = norm_dist.rvs(size=N) # Generate samples  # Fit normal fitting_params = scipy.stats.norm.fit(x) norm_dist_fitted = scipy.stats.norm(*fitting_params) t = np.linspace(np.min(x), np.max(x), 100)  # Plot normals f, ax = plt.subplots(1, sharex='col', figsize=(10, 5)) sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma)) ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r',         label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std())) ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':',         label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std())) ax.legend(loc='lower right') plt.show()   # The lognormal model fits to a variable whose log is normal # We create our variable whose log is normal 'exponenciating' the previous variable  x_exp = np.exp(x) mu_exp = np.exp(mu) sigma_exp = np.exp(sigma)  fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp) lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal) t = np.linspace(np.min(x_exp), np.max(x_exp), 100)  # Here is the magic I was looking for a long long time lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))  # The trick is to understand these two things: # 1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu)) # 2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX) # with: #    - muX = np.mean(np.log(x)) #    - sigmaX = np.std(np.log(x))   # Plot lognormals f, ax = plt.subplots(1, sharex='col', figsize=(10, 5)) sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False,              label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})\n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma)) ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r',         label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std())) ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':',         label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std())) ax.legend(loc='lower right') plt.show() 


标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!