问题
I found a strange behavior of R using lm().
Based on cars
object, following function is to plot fitted breaking distance with a localized linear regression at speed 30.
func1 <- function(fm, spd){
w <- dnorm(cars$speed - spd, sd=5)
fit <- lm(formula = as.formula(fm), weights = w, data=cars)
plot(fitted(fit))
}
func2 <- function(fm, spd){
w <- dnorm(cars$speed - spd, sd=5)
fit <- lm(formula = fm, weights = w, data=cars)
plot(fitted(fit))
}
func1("dist ~ speed", 30)
func2(dist ~ speed, 30)
func1
works. but func2
fails with following message:
Error in eval(expr, envir, enclos) : object 'w' not found
The only difference between two functions is that func2
receives formula class as argument.
Using lm() of R in this style, a formula object should be passed as character?
I tested this with R-3.2.1, RStudio 0.99.467, Windows7.
回答1:
Very interesting case! This relates deeply to the environment feature of R. In short, it seems we should not pass a formula objects defined outside into a function. Although there are some ways to tweak around, the behavior may surprise us.
?formula
says:
A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.
In your func1
, the formula is generated inside the function, hence it is associated with the function environment (function forms an environment).
Hence, when objects are not found in data
, the lm
call looks for them in the function environment. That is how w
is found in func1
.
In the second example, the formula is defined outside the function, or more precisely, in the global environment. Hence the formula looks for objects in the global if not found in the data
. Since there is no w
in the global, it fails. What could be worse is that if you have another w
in the global, this w
would be confused and used as the weight.
Here is an example that highlights the order of object search.
The data only has y
. Hence lm
call looks for x
elsewhere.
But there are two x
. fm
, formula defined in the global finds x = 1:10
, while as.formula(ch)
, defined in the function, finds x = 10:1
.
environment
tells you which environment the formula is associated with.
fun <- function(fm, ch) {
x <- 10:1
dat <- data.frame(y = 1:10)
print(environment(fm))
print(lm(fm, data = dat))
cat("<--- refers to x in the global\n")
print(environment(as.formula(ch)))
print(lm(as.formula(ch), data = dat))
cat("<--- refers to x in the function\n\n")
}
x <- c(1:10)
fun(y ~ x, "y ~ x")
See also: Environments - Advanced R.
来源:https://stackoverflow.com/questions/34452369/using-lm-of-r-a-formula-object-should-be-passed-as-character