cv.glm variable lengths differ

前端 未结 1 1575
醉话见心
醉话见心 2021-01-23 06:58

I am trying to cv.glm on a linear model however each time I do I get the error

Error in model.frame.default(formula = lindata$Y ~ 0 + lindata$HomeAdv +  : 
vari         


        
相关标签:
1条回答
  • 2021-01-23 07:39

    What is causing this error is a mistake in the way you specify the formula

    This will produce the error:

    mod <- glm(mtcars$cyl ~ mtcars$mpg + .,
                data = mtcars, na.action = "na.exclude")
    
    cv.glm(mtcars, mod, K=11) #nrow(mtcars) is a multiple of 11
    

    This not:

    mod <- glm(cyl ~ ., data = mtcars)
    
    cv.glm(mtcars, mod, K=11)
    

    neither this:

    mod <- glm(cyl ~ + mpg + disp, data = mtcars)
    
    cv.glm(mtcars, mod, K=11)
    

    What happens is that you specify the variable in like mtcars$cyl this variable have a number of rows equal to that of the original dataset. When you use cv.glm you partition the data frame in K parts, but when you refit the model on the resampled data it evaluates the variable specified in the form data.frame$var with the original (non partitioned) length, the others (that specified by .) with the partitioned length.

    So you have to use relative variable in the formula (without $).

    Other advices on formula:

    avoid using a mix of specified variables and . you double variables. The dot is for all vars in the df except those on the left of tilde.

    Why do you add a zero? if it is in the attempt to remove the intercept use -1 instead. However, this is a bad practice in my opinion

    0 讨论(0)
提交回复
热议问题