I want to fit lognormal distribution to my data, using python scipy.stats.lognormal.fit
. According to the manual, fit
returns shape, loc, scale parameters. But, lognormal distribution normally needs only two parameters: mean and standard deviation.
How to interpret the results from scipy fit
function? How to get mean and std.dev.?
The distributions in scipy are coded in a generic way wrt two parameter location and scale so that location is the parameter (loc
) which shifts the distribution to the left or right, while scale
is the parameter which compresses or stretches the distribution.
For the two parameter lognormal distribution, the "mean" and "std dev" correspond to log(scale
) and shape
(you can let loc=0
).
The following illustrates how to fit a lognormal distribution to find the two parameters of interest:
In [56]: import numpy as np In [57]: from scipy import stats In [58]: logsample = stats.norm.rvs(loc=10, scale=3, size=1000) # logsample ~ N(mu=10, sigma=3) In [59]: sample = np.exp(logsample) # sample ~ lognormal(10, 3) In [60]: shape, loc, scale = stats.lognorm.fit(sample, floc=0) # hold location to 0 while fitting In [61]: shape, loc, scale Out[61]: (2.9212650122639419, 0, 21318.029350592606) In [62]: np.log(scale), shape # mu, sigma Out[62]: (9.9673084420467362, 2.9212650122639419)
I just spend some time working this out and wanted to document it here: If you want to get the probability density (at point x
) from the three return values of lognorm.fit
(lets call them (shape, loc, scale)
), you need to use this formula:
x = 1 / (shape*((x-loc)/scale)*sqrt(2*pi)) * exp(-1/2*(log((x-loc)/scale)/shape)**2) / scale
So as an equation that is (loc
is
, shape
is σ
and scale
is α
):
I think this will help. I was looking for the same issue for a long time and finally found a solution for my problem. In my case, I was trying to fit some data to the lognormal distribution using scipy.stats.lognorm
module. However, when I finally got the model parameters, I could not find a way to replicate my results using the mean and std from y data.
In the code below, I explain from the mean and std parameters how to produce a normally distributed data sample using scipy.stats.norm module. Using those data, I fit the normal model (norm_dist_fitted
) and also create a normal model using mean and standard deviation (mu, sigma
) extracted from the data.
Original model producing the data, fitted and produced-by-(mu-sigma)-pair is compared in a graph.
Fig1
In the next section of the code, I use the normal data to produce a lognormal-distributed sample. To do so notice that the lognormal samples will be the exponential of the original sample. Hence, the mean and standard deviation of the exponential sample will be (exp(mu)
and exp(sigma)
).
I fitted the produced data to a lognormal
(since the log of my sample (exp(x)) is normally distributed and follow the lognormal model assumptions.
To produce a lognormal model from the mean and standard deviation of your original data (x) the code will be:
lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu))
However, if your data is already in the exponential space (exp(x)), then you have to use:
muX = np.mean(np.log(x)) sigmaX = np.std(np.log(x)) scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX)
Fig2
import scipy import matplotlib.pyplot as plt import seaborn as sns import numpy as np mu = 10 # Mean of sample !!! Make sure your data is positive for the lognormal example sigma = 1.5 # Standard deviation of sample N = 2000 # Number of samples norm_dist = scipy.stats.norm(loc=mu, scale=sigma) # Create Random Process x = norm_dist.rvs(size=N) # Generate samples # Fit normal fitting_params = scipy.stats.norm.fit(x) norm_dist_fitted = scipy.stats.norm(*fitting_params) t = np.linspace(np.min(x), np.max(x), 100) # Plot normals f, ax = plt.subplots(1, sharex='col', figsize=(10, 5)) sns.distplot(x, ax=ax, norm_hist=True, kde=False, label='Data X~N(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma)) ax.plot(t, norm_dist_fitted.pdf(t), lw=2, color='r', label='Fitted Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist_fitted.mean(), norm_dist_fitted.std())) ax.plot(t, norm_dist.pdf(t), lw=2, color='g', ls=':', label='Original Model X~N(mu={0:.1f}, sigma={1:.1f})'.format(norm_dist.mean(), norm_dist.std())) ax.legend(loc='lower right') plt.show() # The lognormal model fits to a variable whose log is normal # We create our variable whose log is normal 'exponenciating' the previous variable x_exp = np.exp(x) mu_exp = np.exp(mu) sigma_exp = np.exp(sigma) fitting_params_lognormal = scipy.stats.lognorm.fit(x_exp, floc=0, scale=mu_exp) lognorm_dist_fitted = scipy.stats.lognorm(*fitting_params_lognormal) t = np.linspace(np.min(x_exp), np.max(x_exp), 100) # Here is the magic I was looking for a long long time lognorm_dist = scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu)) # The trick is to understand these two things: # 1. If the EXP of a variable is NORMAL with MU and STD -> EXP(X) ~ scipy.stats.lognorm(s=sigma, loc=0, scale=np.exp(mu)) # 2. If your variable (x) HAS THE FORM of a LOGNORMAL, the model will be scipy.stats.lognorm(s=sigmaX, loc=0, scale=muX) # with: # - muX = np.mean(np.log(x)) # - sigmaX = np.std(np.log(x)) # Plot lognormals f, ax = plt.subplots(1, sharex='col', figsize=(10, 5)) sns.distplot(x_exp, ax=ax, norm_hist=True, kde=False, label='Data exp(X)~N(mu={0:.1f}, sigma={1:.1f})\n X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(mu, sigma)) ax.plot(t, lognorm_dist_fitted.pdf(t), lw=2, color='r', label='Fitted Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist_fitted.mean(), lognorm_dist_fitted.std())) ax.plot(t, lognorm_dist.pdf(t), lw=2, color='g', ls=':', label='Original Model X~LogNorm(mu={0:.1f}, sigma={1:.1f})'.format(lognorm_dist.mean(), lognorm_dist.std())) ax.legend(loc='lower right') plt.show()