问题
I am running a LOESS regression in R and have come across warnings with some of my smaller data sets.
Warning messages:
1: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : pseudoinverse used at -2703.9
2: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : neighborhood radius 796.09
3: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : reciprocal condition number 0
4: In simpleLoess(y, x, w, span, degree = degree, parametric = parametric, : There are other near singularities as well. 6.1623e+005
These errors are discussed in another post here: Understanding loess errors in R .
It seems to be that these warnings are related to the span set for the LOESS regression. I am trying to apply a similar methodology that was done with other data sets where the parameters for an acceptable smoothing span was between 0.3 and 0.6. In some cases, I am able to adjust the span to avoid these issues, but in other data sets, the span had to be increased beyond the acceptable levels in order to avoid the errors/warnings.
I am curious as to what specifically these warnings mean, and whether this would be a situation where the regression is usable, but it should be noted that these warnings occurred, or if the regression is completely invalid.
Here is an example of a data set that is having issues:
Period Value Total1 Total2
-2950 0.104938272 32.4 3.4
-2715 0.054347826 46 2.5
-2715 0.128378378 37 4.75
-2715 0.188679245 39.75 7.5
-3500 0.245014245 39 9.555555556
-3500 0.163120567 105.75 17.25
-3500 0.086956522 28.75 2.5
-4350 0.171038825 31.76666667 5.433333333
-3650 0.143798024 30.36666667 4.366666667
-4350 0.235588972 26.6 6.266666667
-3500 0.228840125 79.75 18.25
-4933 0.154931973 70 10.8452381
-4350 0.021428571 35 0.75
-3500 0.0625 28 1.75
-2715 0.160714286 28 4.5
-2715 0.110047847 52.25 5.75
-3500 0.176923077 32.5 5.75
-3500 0.226277372 34.25 7.75
-2715 0.132625995 188.5 25
And here is the data without the line-breaks
Period Value Total1 Total2
-2950 0.104938272 32.4 3.4
-2715 0.054347826 46 2.5
-2715 0.128378378 37 4.75
-2715 0.188679245 39.75 7.5
-3500 0.245014245 39 9.555555556
-3500 0.163120567 105.75 17.25
-3500 0.086956522 28.75 2.5
-4350 0.171038825 31.76666667 5.433333333
-3650 0.143798024 30.36666667 4.366666667
-4350 0.235588972 26.6 6.266666667
-3500 0.228840125 79.75 18.25
-4933 0.154931973 70 10.8452381
-4350 0.021428571 35 0.75
-3500 0.0625 28 1.75
-2715 0.160714286 28 4.5
-2715 0.110047847 52.25 5.75
-3500 0.176923077 32.5 5.75
-3500 0.226277372 34.25 7.75
-2715 0.132625995 188.5 25
Here is the code I am using:
Analysis <- read.csv(file.choose(), header = T)
plot(Value ~ Period, Analysis)
a <- order(Analysis$Period)
Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
pred <- predict(Analysis.lo, se = TRUE)
lines(Analysis$Period[a], pred$fit[a], col="red", lwd=3)
lines(Analysis$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(Analysis$Period[a], pred$fit[a] + qt(0.975,pred$df)*pred$se[a],lty=2)
Thanks for your help, and please let me know if any additional information is necessary.
回答1:
The warnings are issued because the algorithm for loess
finds numerical difficulties, due to the fact that Period
has a few values which are repeated a relatively large number of times, as you can see from your plot and also with:
table(Analysis$Period)
In that respect, Period
behaves in fact like a discrete variable (a factor), rather than a continuous one as it would be required for a proper smoothing. Adding some jitter removes the warnings:
Analysis <- read.table(header = T,text="Period Value Total1 Total2
-2950 0.104938272 32.4 3.4
-2715 0.054347826 46 2.5
-2715 0.128378378 37 4.75
-2715 0.188679245 39.75 7.5
-3500 0.245014245 39 9.555555556
-3500 0.163120567 105.75 17.25
-3500 0.086956522 28.75 2.5
-4350 0.171038825 31.76666667 5.433333333
-3650 0.143798024 30.36666667 4.366666667
-4350 0.235588972 26.6 6.266666667
-3500 0.228840125 79.75 18.25
-4933 0.154931973 70 10.8452381
-4350 0.021428571 35 0.75
-3500 0.0625 28 1.75
-2715 0.160714286 28 4.5
-2715 0.110047847 52.25 5.75
-3500 0.176923077 32.5 5.75
-3500 0.226277372 34.25 7.75
-2715 0.132625995 188.5 25")
table(Analysis$Period)
Analysis$Period <- jitter(Analysis$Period, factor=0.2)
plot(Value ~ Period, Analysis)
a <- order(Analysis$Period)
Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
pred <- predict(Analysis.lo, se = TRUE)
lines(Analysis$Period[a], pred$fit[a], col="red", lwd=3)
lines(Analysis$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(Analysis$Period[a], pred$fit[a] + qt(0.975,pred$df)*pred$se[a],lty=2)
Increasing the span
parameter has the effect of "squashing out", along the Period
axis, the piles of repeated values where they occur; with small datasets you need a lot of squashing to compensate for the piling up of repeated Period
s.
From the practical viewpoint, I would generally still trust the regression, possibly after examination of the graphical output. But I would definitely not increase span
to achieve the squashing: it is a lot better to use a tiny amount of jitter
for that purpose; span
should be dictated by other considerations, such as the overall spread of your Period
data etc.
来源:https://stackoverflow.com/questions/38864458/loess-warnings-errors-related-to-span-in-r