LOESS warnings/errors related to span in R

前端 未结 1 937
滥情空心
滥情空心 2021-01-13 10:29

I am running a LOESS regression in R and have come across warnings with some of my smaller data sets.

Warning messages:

1: In simpleLoess(y,

相关标签:
1条回答
  • 2021-01-13 10:52

    The warnings are issued because the algorithm for loess finds numerical difficulties, due to the fact that Period has a few values which are repeated a relatively large number of times, as you can see from your plot and also with:

    table(Analysis$Period)
    

    In that respect, Period behaves in fact like a discrete variable (a factor), rather than a continuous one as it would be required for a proper smoothing. Adding some jitter removes the warnings:

    Analysis <- read.table(header = T,text="Period  Value   Total1  Total2
    -2950   0.104938272 32.4    3.4
    -2715   0.054347826 46  2.5
    -2715   0.128378378 37  4.75
    -2715   0.188679245 39.75   7.5
    -3500   0.245014245 39  9.555555556
    -3500   0.163120567 105.75  17.25
    -3500   0.086956522 28.75   2.5
    -4350   0.171038825 31.76666667 5.433333333
    -3650   0.143798024 30.36666667 4.366666667
    -4350   0.235588972 26.6    6.266666667
    -3500   0.228840125 79.75   18.25
    -4933   0.154931973 70  10.8452381
    -4350   0.021428571 35  0.75
    -3500   0.0625  28  1.75
    -2715   0.160714286 28  4.5
    -2715   0.110047847 52.25   5.75
    -3500   0.176923077 32.5    5.75
    -3500   0.226277372 34.25   7.75
    -2715   0.132625995 188.5   25")
    
    table(Analysis$Period)    
    Analysis$Period <- jitter(Analysis$Period, factor=0.2)
    
    plot(Value ~ Period, Analysis)
    a <- order(Analysis$Period)
    Analysis.lo <- loess(Value ~ Period, Analysis, weights = Total1)
    pred <- predict(Analysis.lo, se = TRUE)
    lines(Analysis$Period[a], pred$fit[a], col="red", lwd=3)
    lines(Analysis$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
    lines(Analysis$Period[a], pred$fit[a] + qt(0.975,pred$df)*pred$se[a],lty=2)
    

    Increasing the span parameter has the effect of "squashing out", along the Period axis, the piles of repeated values where they occur; with small datasets you need a lot of squashing to compensate for the piling up of repeated Periods.

    From the practical viewpoint, I would generally still trust the regression, possibly after examination of the graphical output. But I would definitely not increase span to achieve the squashing: it is a lot better to use a tiny amount of jitter for that purpose; span should be dictated by other considerations, such as the overall spread of your Period data etc.

    0 讨论(0)
提交回复
热议问题