问题
I'm using loess
to calculate the residuals. I expect for the following (small series) to find a big value of the residual for the third point
y <- c(5814, 6083, 17764, 6110, 6556)
x <- c(14564, 14719, 14753, 14754, 15086)
> residuals(loess(y ~ x))
1 2 3 4 5
2.728484e-12 -9.094947e-13 3.637979e-12 3.637979e-12 0.000000e+00
In particular, loess
gives the following output:
> loess(y ~ x)
Call:
loess(formula = y ~ x)
Number of Observations: 5
Equivalent Number of Parameters: 5
Residual Standard Error: Inf
Warning messages:
1: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
span too small. fewer data values than degrees of freedom.
2: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
pseudoinverse used at 14561
3: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
neighborhood radius 191.61
4: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
reciprocal condition number 0
5: In simpleLoess(y, x, w, span, degree, parametric, drop.square, normalize, :
There are other near singularities as well. 1.1263e+005
There is probably a (very simple) reason that I'm missing now, but the above seems strange to me... why it "doesn't work" as expected in my case?
EDIT:
thanks to @Gavin Simpson who suggested me this link I found out in the package MASS
the function rlm
which gives exactly what I was hoping.In the mean time I also tried to use lowess
with several iteration and its fitted values converged actually "better" (in this case) to my data:
library(MASS)
method_rlm <- rlm(x=x,y=y)
method_lowess <- lowess(x,y, iter=7, f=1)
df<-data.frame(x=x, y=y, rlm=method_rlm$fitted.values, lowess=method_lowess$y)
library(ggplot2)
ggplot(df) +
geom_line(aes(x, y), color="red") +
geom_line(aes(x, rlm), color="blue") +
geom_line(aes(x, lowess), color="green") +
geom_point(aes(x, y), color="red")
I also had a look to some timings and the difference is huge..
> microbenchmark(rlm(x=x,y=y), lowess(x,y, iter=7, f=1), times=1000)
Unit: microseconds
expr min lq median uq max neval
rlm(x = x, y = y) 6445.269 6663.972 6906.1350 9417.1895 271494.006 1000
lowess(x, y, iter = 7, f = 1) 169.099 193.046 238.0085 273.9295 3900.493 1000
do you think this difference will be worth? I have million of small series like that ( with 5 to 20 points maximum and similar type of outliers)
回答1:
There are 5 observations in the data and loess()
is fitting a model with 5 degrees of freedom, hence it is able to fit the observed data perfectly and thence the small (effectively 0) residuals. loess()
has sufficient freedom to interpolate your data exactly, but is not a useful summary of the data. Fit a simpler model.
来源:https://stackoverflow.com/questions/17840593/residuals-calculation-in-series-with-few-points-max-20