问题
I am having some issues trying to compete a LOESS regression with a data set. I have been able to properly create the line, but I am unable to get it to plot correctly.
I ran through the data like this.
animals.lo <- loess(X15p5 ~ Period, animals, weights = n.15p5)
animals.lo
summary(animals.lo)
plot(X15p5~ Period, animals)
lines(animals$X15p5, animals.lo, col="red")
At this point I received an error
"Error in xy.coords(x, y) : 'x' and 'y' lengths differ"
I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.
a <- order(animals$Period)
lines(animals$X15p5[a], animals.lo$Period[a], col="red", lwd=3)
There were no errors at this point, but the LOESS line was still not showing up in the plot. The points were displayed correctly, but not the line.
This is similar to the data set I am using...
structure(list(Site = c("Cat", "Dog", "Bear", "Chicken", "Cow",
"Bird", "Tiger", "Lion", "Leopard", "Wolf", "Puppy", "Kitten",
"Emu", "Ostrich", "Elephant", "Sheep", "Goat", "Fish", "Iguana",
"Monkey", "Gorilla", "Baboon", "Lemming", "Mouse", "Rat", "Hamster",
"Eagle", "Parrot", "Crow", "Dove", "Falcon", "Hawk", "Sparrow",
"Kite", "Chimpanzee", "Giraffe", "Bear", "Donkey", "Mule", "Horse",
"Zebra", "Ox", "Snake", "Cobra", "Iguana", "Lizard", "Fly", "Mosquito",
"Llama", "Butterfly", "Moth", "Worm", "Centipede", "Unicorn",
"Pegasus", "Griffin", "Ogre", "Monster", "Demon", "Witch", "Vampire",
"Mummy", "Ghoul", "Zombie"), Region = c(6L, 4L, 4L, 5L, 7L, 6L,
2L, 4L, 6L, 7L, 7L, 4L, 6L, 4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 8L, 4L, 6L, 6L,
4L, 2L, 7L, 4L, 2L, 2L, 7L, 3L, 4L, 7L, 4L, 4L, 4L, 7L, 7L, 4L,
4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 2L, 8L), Period = c(-2715, -3500,
-3500, -4933.333333, -2715, -2715, -2715, -3500, -2715, -4350,
-3500, -3500, -2950, -4350, -3650, -3500, -3500, -2715, -3650,
-4350, -3500, -3500, -3400, -4350, -3500, -3500, -4350, -3900,
-3808.333333, -4233.333333, -3500, -3900, -3958.333333, -3900,
-3500, -3500, -3500, -2715, -3650, -2715, -2715, -2715, -2715,
-3500, -2715, -2715, -3500, -4350, -3650, -3650, -4350, -5400,
-3500, -3958.333333, -3400, -3400, -4350, -3600, -4350, -3650,
-3500, -2715, -5400, -3500), Value = c(0.132625995, 0.163120567,
0.228840125, 0.154931973, 0.110047847, 0.054347826, 0.188679245,
0.245014245, 0.128378378, 0.021428571, 0.226277372, 0.176923077,
0.104938272, 0.17659805, 0.143798024, 0.086956522, 0.0625, 0.160714286,
0, 0.235588972, 0, 0, 0.208333333, 0.202247191, 0.364705882,
0.174757282, 0, 0.4, 0.1, 0.184027778, 0.232876712, 0.160493827,
0.74702381, 0.126984127, 0.080645161, 0.06557377, 0, 0.057692308,
0.285714286, 0.489361702, 0.108695652, 0.377777778, 0, 0.522727273,
0.024390244, 0.097560976, 0.275, 0, 0.0625, 0.255319149, 0.135135135,
0.216216216, 0.222222222, 0.296296296, 0.222222222, 0.146341463,
0.09375, 0.125, 0.041666667, 0.078947368, 0.2, 0.137931034, 0.571428571,
0.142857143), Sample_size = c(188.5, 105.75, 79.75, 70, 52.25,
46, 39.75, 39, 37, 35, 34.25, 32.5, 32.4, 30.76666667, 30.36666667,
28.75, 28, 28, 28, 26.6, 25, 25, 24, 22.25, 21.25, 20.6, 20,
20, 20, 19.2, 18.25, 18, 18, 16.8, 15.5, 15.25, 15, 13, 12.6,
11.75, 11.5, 11.25, 11, 11, 10.25, 10.25, 10, 10, 9.6, 9.4, 9.25,
9.25, 9, 9, 9, 8.2, 8, 8, 8, 7.6, 7.5, 7.25, 7, 7), Sample_sub = c(25,
17.25, 18.25, 10.8452381, 5.75, 2.5, 7.5, 9.555555556, 4.75,
0.75, 7.75, 5.75, 3.4, 5.433333333, 4.366666667, 2.5, 1.75, 4.5,
0, 6.266666667, 0, 0, 5, 4.5, 7.75, 3.6, 0, 8, 2, 3.533333333,
4.25, 2.888888889, 13.44642857, 2.133333333, 1.25, 1, 0, 0.75,
3.6, 5.75, 1.25, 4.25, 0, 5.75, 0.25, 1, 2.75, 0, 0.6, 2.4, 1.25,
2, 2, 2.666666667, 2, 1.2, 0.75, 1, 0.333333333, 0.6, 1.5, 1,
4, 1)), .Names = c("Site", "Region", "Period", "Value", "Sample_size",
"Sample_sub"), class = "data.frame", row.names = c(NA, -64L))
I have been working for this a while and trying to read up as much as I can, but I haven't been able to make any additional headway. Any advice or guidance would be greatly appreciated.
Follow-up on adding confidence interval to plot
I have been trying to add in confidence intervals following another example found on the site on this page How to get the confidence intervals for LOWESS fit using R? .
The example given on that page is:
plot(cars)
plx<-predict(loess(cars$dist ~ cars$speed), se=T)
lines(cars$speed,plx$fit)
lines(cars$speed,plx$fit - qt(0.975,plx$df)*plx$se, lty=2)
lines(cars$speed,plx$fit + qt(0.975,plx$df)*plx$se, lty=2)
I adapted that as this:
plot(X15p5 ~ Period, animals)
animals.lo2<-predict(loess(animals$X15p5 ~ animals$Period), se=T)
a <- order(animals$Period)
lines(animals$Period[a],animals.lo2$fit, col="red", lwd=3)
lines(animals$Period[a],animals.lo2$fit - qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
lines(animals$Period[a],animals.lo2$fit + qt(0.975,animals.lo2$df)*animals.lo2$se, lty=2)
Although this does provide confidence intervals, the regression line is all wrong. I'm not sure if it is an issue with the predict
function, or another issue. Thanks again!
回答1:
correct code
I searched around and read that this issue could be due to the points needing to be ordered, so I proceeded.
No, no. The ordering issue is not related to the error you see. To overcome the error, You need to replace
lines(animals$X15p5, animals.lo, col="red")
with
lines(animals$Period, animals.lo$fitted, col="red")
Here are reasons:
loess
returns a list of objects, not a single vector. Seestr(animals.lo)
ornames(animals.lo)
.- why do you use
animals$X15p5
as x-axis? You fit your model:X15p5 ~ Period
, so x-axis should bePeriod
.
about reordering
You need to do ordering, because by default, R lines up points in order. Take this as an example:
set.seed(0); x <- runif(100, 0, 10) ## x is not in order
set.seed(1); y <- sqrt(x) ## plot curve y = sqrt(x)
par(mfrow = c(1,2))
plot(x, y, type = "l") ## this is a mess!!
reorder <- order(x)
plot(x[reorder], y[reorder], type = "l") ## this is nice
Similarly, do:
a <- order(animals$Period)
lines(animals$Period[a], animals.lo$fitted[a], col="red", lwd=3)
follow-up on confidence interval
Try this:
plot(X15p5 ~ Period, animals)
animals.lo <- loess(X15p5 ~ Period, animals)
pred <- predict(animals.lo, se = TRUE)
a <- order(animals$Period)
lines(animals$Period[a], pred$fit[a], col="red", lwd=3)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
lines(animals$Period[a], pred$fit[a] - qt(0.975, pred$df)*pred$se[a],lty=2)
You forgot about reordering again. You need to reorder both fitted values, as well as standard errors.
Now, the dist ~ speed
model for cars
data has no need for reordering. Because:
is.unsorted(cars$speed) ## FALSE
Yes, data are already sorted there.
Note I have made two other changes to your code:
- I have separated
loess
call andpredict
call; Maybe you don't need to do this, but it is generally a good habit to separate model fitting and model prediction, and keeps a copy of both objects. - I have changed
loess(animals$X15p5 ~ animals$Period)
toloess(X15p5 ~ Period, animals)
. It is a bad habit to use$
sign in specifying model formula. I have another answer at https://stackoverflow.com/a/37307270/4891738 showing the draw back of such style. You can read on the "update" section over there. I have used theglm
as an example, but forlm
,glm
,loess
, things are the same.
来源:https://stackoverflow.com/questions/37499924/problems-displaying-loess-regression-line-and-confidence-interval