问题
I am trying to fit x y data which look something like
x = np.linspace(-2, 2, 1000)
a = 0.5
yl = np.ones_like(x[x < a]) * -0.4 + np.random.normal(0, 0.05, x[x < a].shape[0])
yr = np.ones_like(x[x >= a]) * 0.4 + np.random.normal(0, 0.05, x[x >= a].shape[0])
y = np.concatenate((yl, yr))
plt.scatter(x, y, s=2, color='k')
I'm using a variation of the Heaviside step function
def f(x, a, b): return 0.5 * b * (np.sign(x - a))
and fitting with
popt, pcov = curve_fit(f, x, y, p0=p)
where p is some initial guess. for any p curve_fit fit only b and not a for example:
popt, pcov = curve_fit(f, x, y, p0=[-1.0, 0])
we get that popt is [-1., 0.20117665]
popt, pcov = curve_fit(f, x, y, p0=[.5, 2])
we get taht popt is [.5, 0.79902]
popt, pcov = curve_fit(f, x, y, p0=[1.5, -2])
we get taht popt is [1.5, 0.40128229]
why curve_fit not fitting a?
回答1:
As mentioned by others, curve_fit
(and all the other solvers in scipy.optimize
) work well for optimizing continuous but not discrete variables. They all work by making small (like, at the 1.e-7 level) changes to the parameter values and seeing what (if any) change that makes in the result, and using that change to refine those values until the smallest residual is found. With your model function using np.sign
:
def f(x, a, b): return 0.5 * b * (np.sign(x - a))
such a small change in the value of a
will not change the model or fit result at all. That is, first the fit will try the starting value of, say, a=-1.0
or a=0.5
, and then will try a=-0.999999995
or a=0.500000005
. Those will both give the same result for np.sign(x-a)
. The fit does not know that it would need to change a
by 1 to have any effect on the result. It cannot know this. np.sign()
and np.sin()
differ by one letter, but behave very differently in this respect.
It is pretty common for real data to take a step but to be sampled finely enough so that the step does not happen completely in one step. In that case, you would be able to model the step with a variety of functional forms (linear ramp, error function, arc-tangent, logistic, etc). The thorough answer from @JamesPhilipps gives one approach. I would probably use lmfit
(being one of its main authors) and be willing to guess starting values for the parameters from looking at the data, perhaps:
import numpy as np
x = np.linspace(-2, 2, 1000)
a = 0.5
yl = np.ones_like(x[x < a]) * -0.4 + np.random.normal(0, 0.05, x[x < a].shape[0])
yr = np.ones_like(x[x >= a]) * 0.4 + np.random.normal(0, 0.05, x[x >= a].shape[0])
y = np.concatenate((yl, yr))
from lmfit.models import StepModel, ConstantModel
model = StepModel() + ConstantModel()
params = model.make_params(center=0, sigma=1, amplitude=1., c=-0.5)
result = model.fit(y, params, x=x)
print(result.fit_report())
import matplotlib.pyplot as plt
plt.scatter(x, y, label='data')
plt.plot(x, result.best_fit, marker='o', color='r', label='fit')
plt.show()
which would give a good fit and print out results of
[[Model]]
(Model(step, form='linear') + Model(constant))
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 50
# data points = 1000
# variables = 4
chi-square = 2.32729556
reduced chi-square = 0.00233664
Akaike info crit = -6055.04839
Bayesian info crit = -6035.41737
## Warning: uncertainties could not be estimated:
[[Variables]]
amplitude: 0.80013762 (init = 1)
center: 0.50083312 (init = 0)
sigma: 4.6009e-04 (init = 1)
c: -0.40006255 (init = -0.5)
Note that it will find the center
of the step because it assumed there was some finite width (sigma
) to the step, but then found that width to be smaller than the step size in x
. But also note that it cannot calculate the uncertainties in the parameters because, as above, a small change in center
(your a
) near the solution does not change the resulting fit. FWIW the StepModel
can use a linear, error-function, arc-tangent, or logistic as the step function.
If you had constructed the test data to have a small width to the step, say with something like
from scipy.special import erf
y = 0.638 * erf((x-0.574)/0.005) + np.random.normal(0, 0.05, len(x))
then the fit would have been able to find the best solution and evaluate the uncertainties.
I hope that explains why the fit with your model function could not refine the value of a
, and what might be done about it.
回答2:
Here is a graphical Python fitter using your data and function, with scipy's differential_evolution genetic algorithm module used to provide the initial parameter estimates for curve_fit. That module uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search. In this example, those bounds are taken from the data max and min values.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
# generate data for testing
x = numpy.linspace(-2, 2, 1000)
a = 0.5
yl = numpy.ones_like(x[x < a]) * -0.4 + numpy.random.normal(0, 0.05, x[x < a].shape[0])
yr = numpy.ones_like(x[x >= a]) * 0.4 + numpy.random.normal(0, 0.05, x[x >= a].shape[0])
y = numpy.concatenate((yl, yr))
# alias data to match pervious example
xData = x
yData = y
def func(x, a, b): # variation of the Heaviside step function
return 0.5 * b * (numpy.sign(x - a))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
parameterBounds = []
parameterBounds.append([minX, maxX]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
来源:https://stackoverflow.com/questions/59479443/fitting-step-function-with-variation-in-the-step-location-with-scipy-optimize-cu