I am trying to cv.glm on a linear model however each time I do I get the error
Error in model.frame.default(formula = lindata$Y ~ 0 + lindata$HomeAdv + :
vari
What is causing this error is a mistake in the way you specify the formula
This will produce the error:
mod <- glm(mtcars$cyl ~ mtcars$mpg + .,
data = mtcars, na.action = "na.exclude")
cv.glm(mtcars, mod, K=11) #nrow(mtcars) is a multiple of 11
This not:
mod <- glm(cyl ~ ., data = mtcars)
cv.glm(mtcars, mod, K=11)
neither this:
mod <- glm(cyl ~ + mpg + disp, data = mtcars)
cv.glm(mtcars, mod, K=11)
What happens is that you specify the variable in like mtcars$cyl
this variable have a number of rows equal to that of the original dataset. When you use cv.glm
you partition the data frame in K parts, but when you refit the model on the resampled data it evaluates the variable specified in the form data.frame$var
with the original (non partitioned) length, the others (that specified by .
) with the partitioned length.
So you have to use relative variable in the formula (without $).
Other advices on formula:
avoid using a mix of specified variables and .
you double variables. The dot is for all vars in the df except those on the left of tilde.
Why do you add a zero? if it is in the attempt to remove the intercept use -1 instead. However, this is a bad practice in my opinion