Fitting “multimodal” lognormal distributions to data using python

问题

I have the following data measured using an instrument in the lab. Since the instrument collects particles of different sizes in bins based upon their diameter the measurements are essentially "binned":

import numpy as np
import matplotlib.pylab as plt
from lmfit import models

y = np.array([196, 486, 968, 2262, 3321, 4203, 15072, 46789, 95201, 303494, 421484, 327507, 138931, 27973])
bins = np.array([0.0150, 0.0306, 0.0548, 0.0944, 0.1540, 0.2560, 0.3830, 0.6050, 0.9510, 1.6400, 2.4800, 3.6700, 5.3800, 9.9100, 15])

bin_width=np.diff(bins)
x_plot = np.add(bins[:-1],np.divide(bin_width,2))
x=x_plot
y=y

When plotted here is how the data look. There is one mode around 0.1 and another mode around 2 in the units of the x-scale.

Within this research area it is common to fit "multimodal" lognormal distributions to such data: Given this I have fitted the mode around 2 using LMFIT:

model = models.LognormalModel()
params = model.make_params(center=1.5, sigma=0.6, amplitude=2214337)

result = model.fit(y, params, x=x)
print(result.fit_report())

plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.xscale("log")
plt.yscale("log")
plt.legend()
plt.show()

As expected this results in a good fit for the second mode around 2. My question is how would I also go about fitting a second mode around 0.1 (essentially a sum of the two modes should fit the data)? I realise it could also be argued that three modes would be better but I assume once I understand how to use two modes, the addition of a third should be trivial.

回答1:

lmfit.Models can be added together, as with:

model = (models.LognormalModel(prefix='p1_') +
         models.LognormalModel(prefix='p2_') +
         models.LognormalModel(prefix='p3_') )

params = model.make_params(p1_center=0.3, p1_sigma=0.2, p1_amplitude=1e4,
                           p2_center=1.0, p2_sigma=0.4, p2_amplitude=1e6,
                           p3_center=1.5, p3_sigma=0.6, p3_amplitude=2e7)

In a composite model, each component of the Model gets its own "prefix" (any string) that prepends the parameter names. You can get a dictionary of a Models components after the fit with:

components = result.eval_components()
# {'p1_': Array, 'p2_': Array, 'p3_': Array}
for compname, comp in components.keys():
    plt.plot(x, comp, label=compname)

For fitting data that you would represent on a semi-log or log-log plot, you might consider fitting a model to log(y). Otherwise, the fit will not be very sensitive to misfit at very low values of y.

Note that lmfit models and parameters support bounds. You may want to or find that you need to place bounds such as

params['p1_amplitude'].min = 0
params['p1_sigma'].min = 0
params['p1_center'].max = 0.5
params['p3_center'].min = 1.0

回答2:

This is a log-normal mixture distribution you're trying to fit. You can simply take the log of your data and fit a gaussian mixture instead:

import numpy as np
from sklearn.mixture import GaussianMixture

# Make data from two log-normal distributions
# NOTE: 2d array of shape (n_samples, n_features)
n = 10000
x = np.zeros((n,1))
x[:n//2] = np.random.lognormal(0,1, size=(n//2,1))
x[n//2:] = np.random.lognormal(2,0.5, size=(n//2,1))

# Log transform the data
x_transformed = np.log(x)

# Make gaussian mixture model.
# n_init makes multiple initial guesses and
# depending on data, 1 might be good enough
# Decrease tolerance for speedup or increase for better precision
m = GaussianMixture(n_components=2, n_init=10, tol=1e-6)

# Fit the model
m.fit(x_transformed)

# Get the fitted parameters
# NOTE: covariances are stdev**2
print(m.weights_) # [0.50183897 0.49816103]
print(m.means_) # [1.99866785, -0.00528186]
print(m.covariances_) # [0.25227372,0.99692494]

来源：https://stackoverflow.com/questions/59034668/fitting-multimodal-lognormal-distributions-to-data-using-python

标签

python

curve-fitting

lmfit