I have a weird problem with R that I can\'t seem to work out.
I\'ve tried to write a function that performs K-fold cross validation for a model chosen by the stepwis
When you created your formula, lm.cars
, in was assigned its own environment. This environment stays with the formula unless you explicitly change it. So when you extract the formula with the formula
function, the original environment of the model is included.
I don't know if I'm using the correct terminology here, but I think you need to explicitly change the environment for the formula inside your function:
cv.step <- function(linmod,k=10,direction="both"){
response <- linmod$y
dmatrix <- linmod$x
n <- length(response)
datas <- linmod$model
.env <- environment() ## identify the environment of cv.step
## extract the formula in the environment of cv.step
form <- as.formula(linmod$call, env = .env)
## The rest of your function follows
Another problem that can cause this is if one passes a character
(string vector
) to lm
instead of a formula
. vector
s have no environment
, and so when lm
converts the character
to a formula
, it apparently also has no environment
instead of being automatically assigned the local environment. If one then uses an object as weights that is not in the data argument data.frame
, but is in the local function argument, one gets a not found
error. This behavior is not very easy to understand. It is probably a bug.
Here's a minimal reproducible example. This function takes a data.frame
, two variable names and a vector of weights to use.
residualizer = function(data, x, y, wtds) {
#the formula to use
f = "x ~ y"
#residualize
resid(lm(formula = f, data = data, weights = wtds))
}
residualizer2 = function(data, x, y, wtds) {
#the formula to use
f = as.formula("x ~ y")
#residualize
resid(lm(formula = f, data = data, weights = wtds))
}
d_example = data.frame(x = rnorm(10), y = rnorm(10))
weightsvar = runif(10)
And test:
> residualizer(data = d_example, x = "x", y = "y", wtds = weightsvar)
Error in eval(expr, envir, enclos) : object 'wtds' not found
> residualizer2(data = d_example, x = "x", y = "y", wtds = weightsvar)
1 2 3 4 5 6 7 8 9 10
0.8986584 -1.1218003 0.6215950 -0.1106144 0.1042559 0.9997725 -1.1634717 0.4540855 -0.4207622 -0.8774290
It is a very subtle bug. If one goes into the function environment with browser
, one can see the weights vector just fine, but it somehow is not found in the lm
call!
The bug becomes even harder to debug if one used the name weights
for the weights variable. In this case, since lm
can't find the weights object, it defaults to the function weights()
from base thus throwing an even stranger error:
Error in model.frame.default(formula = f, data = data, weights = weights, :
invalid type (closure) for variable '(weights)'
Don't ask me how many hours it took me to figure this out.