问题
I need to fit this formula
y ~ 1/(pi*a*(1+((x-2.15646)/a)^2))+1/(pi*b*(1+((x-2.16355)/b)^2))
to the variables x and y
x<- c(2.15011, 2.15035, 2.15060, 2.15084, 2.15109, 2.15133, 2.15157, 2.15182, 2.15206, 2.15231, 2.15255, 2.15280, 2.15304, 2.15329, 2.15353, 2.15377, 2.15402, 2.15426, 2.15451, 2.15475, 2.15500, 2.15524, 2.15549, 2.15573, 2.15597, 2.15622, 2.15646, 2.15671, 2.15695, 2.15720, 2.15744, 2.15769, 2.15793, 2.15817, 2.15842, 2.15866, 2.15891, 2.15915, 2.15940, 2.15964, 2.15989, 2.16013, 2.16037, 2.16062, 2.16086, 2.16111, 2.16135, 2.16160, 2.16184, 2.16209, 2.16233, 2.16257, 2.16282, 2.16306, 2.16331, 2.16355, 2.16380, 2.16404, 2.16429, 2.16453, 2.16477, 2.16502, 2.16526, 2.16551, 2.16575, 2.16600, 2.16624, 2.16649, 2.16673, 2.16697, 2.16722, 2.16746, 2.16771, 2.16795, 2.16820, 2.16844, 2.16869, 2.16893, 2.16917, 2.16942, 2.16966, 2.16991)
y<- c(3.77212, 3.79541, 3.84574, 3.91918, 4.01056, 4.11677, 4.23851, 4.37986, 4.54638, 4.74367, 4.97765, 5.25593, 5.58823, 5.98405, 6.44850, 6.98006, 7.57280, 8.22085, 8.92094, 9.66990, 10.45900, 11.26720, 12.05540, 12.76920, 13.34830, 13.74250, 13.92420, 13.89250, 13.67090, 13.29980, 12.82780, 12.30370, 11.76950, 11.25890, 10.80020, 10.41860, 10.13840, 9.98005, 9.95758, 10.07690, 10.33680, 10.73210, 11.25730, 11.90670, 12.67240, 13.54110, 14.49530, 15.51670, 16.58660, 17.67900, 18.75190, 19.74600, 20.59680, 21.24910, 21.66800, 21.83910, 21.76560, 21.46020, 20.94020, 20.22730, 19.35360, 18.36460, 17.31730, 16.26920, 15.26920, 14.35320, 13.54360, 12.85230, 12.28520, 11.84690, 11.54040, 11.36610, 11.32130, 11.39980, 11.59230, 11.88310, 12.25040, 12.66660, 13.09810, 13.50220, 13.82580, 14.01250)
for estiming 'a' and 'b' values according to x and y. 'a' and 'b' are in the range between 0 and 1.
However, when I used the nls command:
nls(y ~1/(pi*a*(1+((x-2.15646)/a)^2))+1/(pi*b*(1+((x-2.16355)/b)^2)), control = list(maxiter = 500), start=list(a=0.4,b=0.4))
The console reported the following error:
singular gradient
Can anyone explain to me why does the console print this message?
回答1:
This gives a better fit:
Before getting into the code (below), there are several issues with your model:
- Assuming this is proton NMR, the area under the peaks is proportional to the proton abundance (so, number of protons). Your model does not allow for this, essentially forcing all peaks to have the same area. This is the main reason for the poor fit. We can accommodate this easily by including a "height" factor for each peak.
- Your model assumes the peak positions. Why not just let the algorithm find the true peak positions?
- Your model does not account for baseline drift, which as you can see is quite severe in your dataset. We can accommodate this by adding a linear drift function to the model.
nls(...)
is poor for this type of modeling - the algorithms it uses are not especially robust. The default algorithm, Gauss-Newton, is especially poor when fitting offsetted data. So estimating p1
and p2
in a model with f(x-p1,x-p2)
nearly always fails.
A better approach is to use the exceptionally robust Levenberg-Marquardt algorithm implemented in nls.lm(...)
in package minpack
. This package is a bit harder to use but it is capable of dealing with problems inaccessible with nls(...)
. If you're going to do a lot of this, you should read the documentation to understand how this example works.
Finally, even with nls.lm(...)
the starting points have to be reasonable. In your model a and b are the peak widths. Clearly they must be comparable to or smaller than the difference in peak positions or the peaks will get smeared together. Your estimates of (a,b) = (0.4, 0.4) were much too large.
plot(x,y)
library(minpack.lm)
lorentzian <- function(par,x){
a <- par[1]
b <- par[2]
p1 <- par[3]
p2 <- par[4]
h1 <- par[5]
h2 <- par[6]
drift.a <- par[7]
drift.b <- par[8]
h1/(pi*a*(1+((x-p1)/a)^2))+h2/(pi*b*(1+((x-p2)/b)^2)) + drift.a + drift.b*x
}
resid <- function(par,obs,xx) {obs-lorentzian(par,xx)}
par=c(a=0.001,b=0.001, p1=2.157, p2=2.163, h1=1, h2=1, drift.a=0, drift.b=0)
lower=c(a=0,b=0,p1=0,p2=0,h1=0, h2=0,drift.a=NA,drift.b=NA)
nls.out <- nls.lm(par=par, lower=lower, fn=resid, obs=y, xx=x,
control = nls.lm.control(maxiter=500))
coef(nls.out)
# a b p1 p2 h1 h2 drift.a drift.b
# 1.679632e-03 1.879690e-03 2.156308e+00 2.163500e+00 4.318793e-02 8.199394e-02 -9.273083e+02 4.323897e+02
lines(x,lorentzian(coef(nls.out), x), col=2, lwd=2)
One last thing: the convention on SO is to wait a day before "accepting" an answer. The reason is that questions with accepted answers rarely get additional attention - once you accept an answer, no one else will look at it.
回答2:
Try using the reciprocals of a
and b
:
fm<-nls(y~1/(pi*(1/a)*(1+((x-2.15646)/(1/a))^2))+1/(pi*(1/b)*(1+((x-2.16355)/(1/b))^2)),
lower = 1, alg = "port",
control = list(maxiter = 500),
start = list(a = 1/.4, b = 1/.4))
which gives:
> 1/coef(fm)
a b
1.00000000 0.02366843
Unfortunately the model does not work very well as the graph at the bottom shows.
plot(y ~ x, pch = 20)
lines(x, fitted(fm), col = "red")
ADDED:
In his answer, @jlhoward provided a better fitting model based on 8 parameters. I will just point out that if we used his model with his starting values for a
, b
, p1
and p2
(we don't need starting values for the linear parameters if we specify alg = "plinear"
) then nls
would work too.
fo <- y ~ cbind(1/(pi*a*(1+((x-p1)/a)^2)), 1/(pi*b*(1+((x-p2)/b)^2)), 1, x)
start <- c(a = 0.001, b = 0.001, p1 = 2.157, p2 = 2.163)
fm2 <- nls(fo, start = start, alg = "plinear")
giving:
> coef(fm2)
a b p1 p2 .lin1
1.679635e-03 1.879682e-03 2.156308e+00 2.163500e+00 4.318798e-02
.lin2 .lin3 .lin.x
8.199364e-02 -9.273104e+02 4.323907e+02
Graph showing poor fit for fm
:
REVISED to add constraints.
来源:https://stackoverflow.com/questions/21435331/nls-function-does-not-perform-well