可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
My knowledge of maths is limited which is why I am probably stuck. I have a spectra to which I am trying to fit two Gaussian peaks. I can fit to the largest peak, but I cannot fit to the smallest peak. I understand that I need to sum the Gaussian function for the two peaks but I do not know where I have gone wrong. An image of my current output is shown:
The blue line is my data and the green line is my current fit. There is a shoulder to the left of the main peak in my data which I am currently trying to fit, using the following code:
import matplotlib.pyplot as pt import numpy as np from scipy.optimize import leastsq from pylab import * time = [] counts = [] for i in open('/some/folder/to/file.txt', 'r'): segs = i.split() time.append(float(segs[0])) counts.append(segs[1]) time_array = arange(len(time), dtype=float) counts_array = arange(len(counts)) time_array[0:] = time counts_array[0:] = counts def model(time_array0, coeffs0): a = coeffs0[0] + coeffs0[1] * np.exp( - ((time_array0-coeffs0[2])/coeffs0[3])**2 ) b = coeffs0[4] + coeffs0[5] * np.exp( - ((time_array0-coeffs0[6])/coeffs0[7])**2 ) c = a+b return c def residuals(coeffs, counts_array, time_array): return counts_array - model(time_array, coeffs) # 0 = baseline, 1 = amplitude, 2 = centre, 3 = width peak1 = np.array([0,6337,16.2,4.47,0,2300,13.5,2], dtype=float) #peak2 = np.array([0,2300,13.5,2], dtype=float) x, flag = leastsq(residuals, peak1, args=(counts_array, time_array)) #z, flag = leastsq(residuals, peak2, args=(counts_array, time_array)) plt.plot(time_array, counts_array) plt.plot(time_array, model(time_array, x), color = 'g') #plt.plot(time_array, model(time_array, z), color = 'r') plt.show()
回答1:
This code worked for me providing that you are only fitting a function that is a combination of two Gaussian distributions.
I just made a residuals function that adds two Gaussian functions and then subtracts them from the real data.
The parameters (p) that I passed to Numpy's least squares function include: the mean of the first Gaussian function (m), the difference in the mean from the first and second Gaussian functions (dm, i.e. the horizontal shift), the standard deviation of the first (sd1), and the standard deviation of the second (sd2).
import numpy as np from scipy.optimize import leastsq import matplotlib.pyplot as plt ###################################### # Setting up test data def norm(x, mean, sd): norm = [] for i in range(x.size): norm += [1.0/(sd*np.sqrt(2*np.pi))*np.exp(-(x[i] - mean)**2/(2*sd**2))] return np.array(norm) mean1, mean2 = 0, -2 std1, std2 = 0.5, 1 x = np.linspace(-20, 20, 500) y_real = norm(x, mean1, std1) + norm(x, mean2, std2) ###################################### # Solving m, dm, sd1, sd2 = [5, 10, 1, 1] p = [m, dm, sd1, sd2] # Initial guesses for leastsq y_init = norm(x, m, sd1) + norm(x, m + dm, sd2) # For final comparison plot def res(p, y, x): m, dm, sd1, sd2 = p m1 = m m2 = m1 + dm y_fit = norm(x, m1, sd1) + norm(x, m2, sd2) err = y - y_fit return err plsq = leastsq(res, p, args = (y_real, x)) y_est = norm(x, plsq[0][0], plsq[0][2]) + norm(x, plsq[0][0] + plsq[0][1], plsq[0][3]) plt.plot(x, y_real, label='Real Data') plt.plot(x, y_init, 'r.', label='Starting Guess') plt.plot(x, y_est, 'g.', label='Fitted') plt.legend() plt.show()
回答2:
You can use Gaussian mixture models from scikit-learn:
from sklearn import mixture import matplotlib.pyplot import matplotlib.mlab import numpy as np clf = mixture.GMM(n_components=2, covariance_type='full') clf.fit(yourdata) m1, m2 = clf.means_ w1, w2 = clf.weights_ c1, c2 = clf.covars_ histdist = matplotlib.pyplot.hist(yourdata, 100, normed=True) plotgauss1 = lambda x: plot(x,w1*matplotlib.mlab.normpdf(x,m1,np.sqrt(c1))[0], linewidth=3) plotgauss2 = lambda x: plot(x,w2*matplotlib.mlab.normpdf(x,m2,np.sqrt(c2))[0], linewidth=3) plotgauss1(histdist[1]) plotgauss2(histdist[1])
You can also use the function below to fit the number of Gaussian you want with ncomp parameter:
from sklearn import mixture %pylab def fit_mixture(data, ncomp=2, doplot=False): clf = mixture.GMM(n_components=ncomp, covariance_type='full') clf.fit(data) ml = clf.means_ wl = clf.weights_ cl = clf.covars_ ms = [m[0] for m in ml] cs = [numpy.sqrt(c[0][0]) for c in cl] ws = [w for w in wl] if doplot == True: histo = hist(data, 200, normed=True) for w, m, c in zip(ws, ms, cs): plot(histo[1],w*matplotlib.mlab.normpdf(histo[1],m,np.sqrt(c)), linewidth=3) return ms, cs, ws
回答3:
coeffs 0 and 4 are degenerate - there is absolutely nothing in the data that can decide between them. you should use a single zero level parameter instead of two (ie remove one of them from your code). this is probably what is stopping your fit (ignore the comments here saying this is not possible - there are clearly at least two peaks in that data and you should certainly be able to fit to that).
(it may not be clear why i am suggesting this, but what is happening is that coeffs 0 and 4 can cancel each other out. they can both be zero, or one could be 100 and the other -100 - either way, the fit is just as good. this "confuses" the fitting routine, which spends its time trying to work out what they should be, when there is no single right answer, because whatever value one is, the other can just be the negative of that, and the fit will be the same).
in fact, from the plot, it looks like there may be no need for a zero level at all. i would try dropping both of those and seeing how the fit looks.
also, there is no need to fit coeffs 1 and 5 (or the zero point) in the least squares. instead, because the model is linear in those you could calculate their values each loop. this will make things faster, but is not critical. i just noticed you say your maths is not so good, so probably ignore this one.