Object not found error when passing model formula to another function

后端 未结 2 921
悲&欢浪女
悲&欢浪女 2020-12-03 03:53

I have a weird problem with R that I can\'t seem to work out.

I\'ve tried to write a function that performs K-fold cross validation for a model chosen by the stepwis

相关标签:
2条回答
  • 2020-12-03 04:20

    When you created your formula, lm.cars, in was assigned its own environment. This environment stays with the formula unless you explicitly change it. So when you extract the formula with the formula function, the original environment of the model is included.

    I don't know if I'm using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:

    cv.step <- function(linmod,k=10,direction="both"){
      response <- linmod$y
      dmatrix <- linmod$x
      n <- length(response)
      datas <- linmod$model
      .env <- environment() ## identify the environment of cv.step
    
      ## extract the formula in the environment of cv.step
      form <- as.formula(linmod$call, env = .env) 
    
      ## The rest of your function follows
    
    0 讨论(0)
  • 2020-12-03 04:41

    Another problem that can cause this is if one passes a character (string vector) to lm instead of a formula. vectors have no environment, and so when lm converts the character to a formula, it apparently also has no environment instead of being automatically assigned the local environment. If one then uses an object as weights that is not in the data argument data.frame, but is in the local function argument, one gets a not found error. This behavior is not very easy to understand. It is probably a bug.

    Here's a minimal reproducible example. This function takes a data.frame, two variable names and a vector of weights to use.

    residualizer = function(data, x, y, wtds) {
      #the formula to use
      f = "x ~ y" 
    
      #residualize
      resid(lm(formula = f, data = data, weights = wtds))
    }
    
    residualizer2 = function(data, x, y, wtds) {
      #the formula to use
      f = as.formula("x ~ y")
    
      #residualize
      resid(lm(formula = f, data = data, weights = wtds))
    }
    
    d_example = data.frame(x = rnorm(10), y = rnorm(10))
    weightsvar = runif(10)
    

    And test:

    > residualizer(data = d_example, x = "x", y = "y", wtds = weightsvar)
    Error in eval(expr, envir, enclos) : object 'wtds' not found
    
    > residualizer2(data = d_example, x = "x", y = "y", wtds = weightsvar)
             1          2          3          4          5          6          7          8          9         10 
     0.8986584 -1.1218003  0.6215950 -0.1106144  0.1042559  0.9997725 -1.1634717  0.4540855 -0.4207622 -0.8774290 
    

    It is a very subtle bug. If one goes into the function environment with browser, one can see the weights vector just fine, but it somehow is not found in the lm call!

    The bug becomes even harder to debug if one used the name weights for the weights variable. In this case, since lm can't find the weights object, it defaults to the function weights() from base thus throwing an even stranger error:

    Error in model.frame.default(formula = f, data = data, weights = weights,  : 
      invalid type (closure) for variable '(weights)'
    

    Don't ask me how many hours it took me to figure this out.

    0 讨论(0)
提交回复
热议问题