Python: two-curve gaussian fitting with non-linear least-squares

匿名 (未验证) 提交于 2019-12-03 01:58:03

问题:

My knowledge of maths is limited which is why I am probably stuck. I have a spectra to which I am trying to fit two Gaussian peaks. I can fit to the largest peak, but I cannot fit to the smallest peak. I understand that I need to sum the Gaussian function for the two peaks but I do not know where I have gone wrong. An image of my current output is shown:

The blue line is my data and the green line is my current fit. There is a shoulder to the left of the main peak in my data which I am currently trying to fit, using the following code:

import matplotlib.pyplot as pt import numpy as np from scipy.optimize import leastsq from pylab import *  time = [] counts = []   for i in open('/some/folder/to/file.txt', 'r'):     segs = i.split()     time.append(float(segs[0]))     counts.append(segs[1])  time_array = arange(len(time), dtype=float) counts_array = arange(len(counts)) time_array[0:] = time counts_array[0:] = counts   def model(time_array0, coeffs0):     a = coeffs0[0] + coeffs0[1] * np.exp( - ((time_array0-coeffs0[2])/coeffs0[3])**2 )     b = coeffs0[4] + coeffs0[5] * np.exp( - ((time_array0-coeffs0[6])/coeffs0[7])**2 )      c = a+b     return c   def residuals(coeffs, counts_array, time_array):     return counts_array - model(time_array, coeffs)  # 0 = baseline, 1 = amplitude, 2 = centre, 3 = width peak1 = np.array([0,6337,16.2,4.47,0,2300,13.5,2], dtype=float) #peak2 = np.array([0,2300,13.5,2], dtype=float)  x, flag = leastsq(residuals, peak1, args=(counts_array, time_array)) #z, flag = leastsq(residuals, peak2, args=(counts_array, time_array))  plt.plot(time_array, counts_array) plt.plot(time_array, model(time_array, x), color = 'g')  #plt.plot(time_array, model(time_array, z), color = 'r') plt.show()

回答1:

This code worked for me providing that you are only fitting a function that is a combination of two Gaussian distributions.

I just made a residuals function that adds two Gaussian functions and then subtracts them from the real data.

The parameters (p) that I passed to Numpy's least squares function include: the mean of the first Gaussian function (m), the difference in the mean from the first and second Gaussian functions (dm, i.e. the horizontal shift), the standard deviation of the first (sd1), and the standard deviation of the second (sd2).

import numpy as np from scipy.optimize import leastsq import matplotlib.pyplot as plt  ###################################### # Setting up test data def norm(x, mean, sd):   norm = []   for i in range(x.size):     norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))]   return np.array(norm)  mean1, mean2 = 0, -2 std1, std2 = 0.5, 1   x = np.linspace(-20, 20, 500) y_real = norm(x, mean1, std1) + norm(x, mean2, std2)  ###################################### # Solving m, dm, sd1, sd2 = [5, 10, 1, 1] p = [m, dm, sd1, sd2] # Initial guesses for leastsq y_init = norm(x, m, sd1) + norm(x, m + dm, sd2) # For final comparison plot  def res(p, y, x):   m, dm, sd1, sd2 = p   m1 = m   m2 = m1 + dm   y_fit = norm(x, m1, sd1) + norm(x, m2, sd2)   err = y - y_fit   return err  plsq = leastsq(res, p, args = (y_real, x))  y_est = norm(x, plsq[0][0], plsq[0][2]) + norm(x, plsq[0][0] + plsq[0][1], plsq[0][3])  plt.plot(x, y_real, label='Real Data') plt.plot(x, y_init, 'r.', label='Starting Guess') plt.plot(x, y_est, 'g.', label='Fitted') plt.legend() plt.show()



回答2:

You can use Gaussian mixture models from scikit-learn:

from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy as np clf = mixture.GMM(n_components=2, covariance_type='full') clf.fit(yourdata) m1, m2 = clf.means_ w1, w2 = clf.weights_ c1, c2 = clf.covars_ histdist = matplotlib.pyplot.hist(yourdata, 100, normed=True) plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3) plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3) plotgauss1(histdist[1]) plotgauss2(histdist[1])

You can also use the function below to fit the number of Gaussian you want with ncomp parameter:

from sklearn import mixture %pylab  def fit_mixture(data, ncomp=2, doplot=False):     clf = mixture.GMM(n_components=ncomp, covariance_type='full')     clf.fit(data)     ml = clf.means_     wl = clf.weights_     cl = clf.covars_     ms = [m[0] for m in ml]     cs = [numpy.sqrt(c[0][0]) for c in cl]     ws = [w for w in wl]     if doplot == True:         histo = hist(data, 200, normed=True)         for w, m, c in zip(ws, ms, cs):             plot(histo[1],w*matplotlib.mlab.normpdf(histo[1],m,np.sqrt(c)), linewidth=3)     return ms, cs, ws


回答3:

coeffs 0 and 4 are degenerate - there is absolutely nothing in the data that can decide between them. you should use a single zero level parameter instead of two (ie remove one of them from your code). this is probably what is stopping your fit (ignore the comments here saying this is not possible - there are clearly at least two peaks in that data and you should certainly be able to fit to that).

(it may not be clear why i am suggesting this, but what is happening is that coeffs 0 and 4 can cancel each other out. they can both be zero, or one could be 100 and the other -100 - either way, the fit is just as good. this "confuses" the fitting routine, which spends its time trying to work out what they should be, when there is no single right answer, because whatever value one is, the other can just be the negative of that, and the fit will be the same).

in fact, from the plot, it looks like there may be no need for a zero level at all. i would try dropping both of those and seeing how the fit looks.

also, there is no need to fit coeffs 1 and 5 (or the zero point) in the least squares. instead, because the model is linear in those you could calculate their values each loop. this will make things faster, but is not critical. i just noticed you say your maths is not so good, so probably ignore this one.



标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!