Cross Validating step functions in R

≡放荡痞女 提交于 2019-12-13 06:34:25

问题


I am trying to get errors from step functions but I get an error :

library(boot)
library(ISLR)
attach(Wage)
set.seed(5082)
cv.error <- rep (0,12)
for (i in 2:13){
    step.fit = glm(wage~cut(age,i), data = Wage)
    cv.error[i] <- cv.glm(Wage ,step.fit, K= 10)$delta [1]
}

Error in model.frame.default(Terms, newdata, na.action = na.action, xlev = object$xlevels) :
cut(age, i) has new levels (17.9,43.5], (43.5,69.1]

I can get the error from cv.glm()$delta [1] if instead of auto generating the cut() index i use specific breaks:

fit <- glm(wage~cut(age,breaks=c(17.9,33.5,49,64.5,80.1)), data = Wage)
cv.error <- cv.glm(Wage ,step.fit, K= 10)$delta [1]'

Even though these are the exact same breaks cut(age,4) makes.

Can anyone explain what is going on or how to fix the error.

My goal is to try to find errors from 12 different step models and pick the best one based on the cv.glm()$delta error.


回答1:


The problem was that cut(age, i) existed only as an inline creation within your glm() and was not a part of the Wage data set you used for validation. We can fix that like this:

library(boot)
library(ISLR)
data(Wage) # using attach is a bad practice
set.seed(5082)
cv.error <- rep (0,12)
for (i in 2:13){
  Wage$tmp <- cut(Wage$age,i)
  step.fit = glm(wage~tmp, data = Wage)
  cv.error[i] <- cv.glm(Wage ,step.fit, K= 10)$delta [1]
}

cv.error

[1] 0.000 1733.815 1682.731 1637.200 1631.049 1623.069 1613.099 1600.413 1613.127 1603.581 1603.601 1604.730 1602.462

Note that the first value is 0 only because the values of i start at 2 so nothing was ever written to the first element.




回答2:


I looked into how to get the labels from cut output and found a helpful note at the end of the documentation (??cut)

## one way to extract the breakpoints
labs <- levels(cut(aaa, 3))
cbind(lower = as.numeric( sub("\\((.+),.*", "\\1", labs) ),
      upper = as.numeric( sub("[^,]*,([^]]*)\\]", "\\1", labs) ))

So putting that to use:

library(boot)
library(ISLR)
data(Wage)
set.seed(5082)
cv.error <- rep (0,12)
for (i in 2:13){
  labs <- levels(cut(age, i))
  breaks <- unique(c(as.numeric(sub("\\((.+),.*", "\\1", labs)),
                    as.numeric(sub("[^,]*,([^]]*)\\]", "\\1", labs))))
  step.fit <- glm(wage~cut(age,unique(breaks)), data = Wage)
  cv.error[i] <- cv.glm(Wage ,step.fit, K=10)$delta[1]
}

cv.error
 [1]    0.000 1733.815 1682.731 1637.200 1631.049 1623.069 1613.099 1600.413 1613.127 1603.581 1603.601
[12] 1604.730 1602.462


来源:https://stackoverflow.com/questions/42190337/cross-validating-step-functions-in-r

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!