问题
I am trying to get errors from step functions but I get an error :
library(boot)
library(ISLR)
attach(Wage)
set.seed(5082)
cv.error <- rep (0,12)
for (i in 2:13){
step.fit = glm(wage~cut(age,i), data = Wage)
cv.error[i] <- cv.glm(Wage ,step.fit, K= 10)$delta [1]
}
Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
cut(age, i) has new levels (17.9,43.5], (43.5,69.1]
I can get the error from cv.glm()$delta [1]
if instead of auto generating the cut()
index i use specific breaks:
fit <- glm(wage~cut(age,breaks=c(17.9,33.5,49,64.5,80.1)), data = Wage)
cv.error <- cv.glm(Wage ,step.fit, K= 10)$delta [1]'
Even though these are the exact same breaks cut(age,4)
makes.
Can anyone explain what is going on or how to fix the error.
My goal is to try to find errors from 12 different step models and pick the best one based on the cv.glm()$delta
error.
回答1:
The problem was that cut(age, i)
existed only as an inline creation within your glm()
and was not a part of the Wage
data set you used for validation. We can fix that like this:
library(boot)
library(ISLR)
data(Wage) # using attach is a bad practice
set.seed(5082)
cv.error <- rep (0,12)
for (i in 2:13){
Wage$tmp <- cut(Wage$age,i)
step.fit = glm(wage~tmp, data = Wage)
cv.error[i] <- cv.glm(Wage ,step.fit, K= 10)$delta [1]
}
cv.error
[1] 0.000 1733.815 1682.731 1637.200 1631.049 1623.069 1613.099 1600.413 1613.127 1603.581 1603.601 1604.730 1602.462
Note that the first value is 0 only because the values of i
start at 2 so nothing was ever written to the first element.
回答2:
I looked into how to get the labels from cut
output and found a helpful note at the end of the documentation (??cut
)
## one way to extract the breakpoints
labs <- levels(cut(aaa, 3))
cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))
So putting that to use:
library(boot)
library(ISLR)
data(Wage)
set.seed(5082)
cv.error <- rep (0,12)
for (i in 2:13){
labs <- levels(cut(age, i))
breaks <- unique(c(as.numeric(sub("\\((.+),.*", "\\1", labs)),
as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", labs))))
step.fit <- glm(wage~cut(age,unique(breaks)), data = Wage)
cv.error[i] <- cv.glm(Wage ,step.fit, K=10)$delta[1]
}
cv.error
[1] 0.000 1733.815 1682.731 1637.200 1631.049 1623.069 1613.099 1600.413 1613.127 1603.581 1603.601
[12] 1604.730 1602.462
来源:https://stackoverflow.com/questions/42190337/cross-validating-step-functions-in-r