问题
I am trying to fit a distribution with scipy's curve_fit. I tried to fit a one component exponential function which resulted in an almost straight line (see figure). I also tried a two component exponential fit which seemed to work nicely. Two components just means that a part of the equation repeats with different input parameters. Anyway, here is the one component fit function:
def Exponential(Z,w0,z0,Z0):
z = Z - Z0
termB = (newsigma**2 + z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
termA = (newsigma**2 - z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
return w0/2.0 * numpy.exp(-(z**2 / (2.0*newsigma**2))) * (numpy.exp(termA**2)*erfc(termA) + numpy.exp(termB**2)*erfc(termB))
and the fitting is done with
fitexp = curve_fit(Exponential,newx,y2)
Then I tried something, just to try it out. I took two parameters of the two component fit, but did not use them in the calculation.
def ExponentialNew(Z,w0,z0,w1,z1,Z0):
z = Z - Z0
termB = (newsigma**2 + z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
termA = (newsigma**2 - z*z0) / (numpy.sqrt(2.0)*newsigma*z0)
return w0/2.0 * numpy.exp(-(z**2 / (2.0*newsigma**2))) * (numpy.exp(termA**2)*erfc(termA) + numpy.exp(termB**2)*erfc(termB))
And suddenly this works.
Now, my quation is. WHY? As you can see, there is absolutely no difference in the calculation of the fit. It just gets two extra variables that are not used. Should this not get the same result?
@Andras_Deak An actual example:
from scipy.special import erfc
import numpy
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
#setup data
x = [-58.,-54.,-50.,-46.,-42.,-38.,-34.,-30.,-26.,-22.,-18.,-14.,-10.,-6.,-2.,2.,6.,10.,14.,18.,22.,26.,30.,34.,38.,42.,46.,50.,54.,58.]
y = [23.06763817, 16.89802085, 17.83258379, 16.63446237, 13.81878965, 12.97965839, 14.30451789, 16.98288216, 22.26811491, 28.56756908, 33.06990344, 38.59842098, 54.19860393, 86.37381604, 137.47253315, 199.49724512, 238.66047662, 219.89405445, 160.68820199, 103.88901303, 65.92405727, 43.84596266, 31.5395342, 25.9610156, 22.71683709, 18.06740651, 13.85362374, 11.12867065, 10.36502799, 11.31855619]
y_err = [17.9823065, 4.13684885, 1.66490726, 2.4109372, 2.93359141, 1.9701747, 3.19214881, 3.65593012, 2.89089074, 3.58922121, 4.25505348, 4.72728874, 6.77736567, 11.3888196, 21.87771722, 39.0087495, 56.6910311, 51.7592369, 26.39750958, 10.62678862, 7.85893395, 8.11741621, 7.91731416, 7.07739132, 5.41818744, 6.11286843, 8.27070757, 7.85323065, 4.26885499, 0.9047867]
#function to fit
def Exponential2(Z, w0, z0, w1, z1, Z0):
z = Z - Z0
s = 3.98098937586
a = z**2 / (2.0*s**2)
b = (s**2 + z*z0) / (numpy.sqrt(2.0)*s*z0)
c = (s**2 - z*z0) / (numpy.sqrt(2.0)*s*z0)
d = (s**2 + z*z1) / (numpy.sqrt(2.0)*s*z1)
e = (s**2 - z*z1) / (numpy.sqrt(2.0)*s*z1)
return w0/2.0 * numpy.exp(-a) * (numpy.exp(c**2)*erfc(c) + numpy.exp(b**2)*erfc(b)) + w1/2.0 * numpy.exp(-a) * (numpy.exp(e**2)*erfc(e) + numpy.exp(d**2)*erfc(d))
#derive and set initial guess
ymaxpos = x[numpy.where(y==numpy.max(y))[0]]
p0_2 = [numpy.max(y),5,numpy.max(y)/2.0,20,ymaxpos]
#fit
fitexp2 = curve_fit(Exponential2,x,y,p0=p0_2,sigma=y_err)
#get results
w0err = numpy.sqrt(numpy.diag(fitexp2[1]))[0]
z0err = numpy.sqrt(numpy.diag(fitexp2[1]))[1]
w1err = numpy.sqrt(numpy.diag(fitexp2[1]))[2]
z1err = numpy.sqrt(numpy.diag(fitexp2[1]))[3]
w0 = fitexp2[0][0]
z0 = fitexp2[0][1]
w1 = fitexp2[0][2]
z1 = fitexp2[0][3]
Z0 = fitexp2[0][4]
#new x array for smoother curve
smoothx = numpy.arange(-58,59,0.1)
y2 = Exponential2(smoothx,w0,z0,w1,z1,Z0)
print 'Exponential 2: w0: '+str(w0.round(3))+' +/- '+str(w0err.round(3))+' \t z0: '+str(z0.round(3))+' +/- '+str(z0err.round(3))+' \t w1: '+str(w1.round(3))+' +/- '+str(w1err.round(3))+' \t\t z1: '+str(z1.round(3))+' +/- '+str(z1err.round(3))
#plot
fig = plt.figure()
ax = fig.add_subplot(111)
ax.errorbar(x,y,y_err,fmt='o',markersize=2,label='data')
ax.plot(smoothx,y2,label='fit',color='red')
ax.grid()
ax.legend()
plt.show()
As you can see, the plot does look good, but the returned value z1 is totaly unrealistic.
Exponential 2: w0: 312.608 +/- 36.764 z0: 8.263 +/- 1.158 w1: 12.689 +/- 9.138 z1: 1862257.883 +/- 45201809883.8
回答1:
In my experience curve_fit can sometimes act up and stick with the initial values for the parameters. I would suspect that in your case adding a few fake parameters changed the heuristics of how the relevant parameters are being initialized (although this contradicts the documentation's statement that with no initial values given, they all default to 1).
It helps a lot in obtaining reliable fits if you specify reasonable bounds and initial values for your fitting parameters (I mean the p0
and bounds
keywords). The fact that the default starting values should all be 1
suggests that for most use cases, the default won't cut it.
来源:https://stackoverflow.com/questions/40008017/scipy-curve-fit-strange-result