Using lm() of R, a formula object should be passed as character?

和自甴很熟 提交于 2019-12-10 15:15:35

问题


I found a strange behavior of R using lm().

Based on cars object, following function is to plot fitted breaking distance with a localized linear regression at speed 30.

func1 <- function(fm, spd){
  w <- dnorm(cars$speed - spd, sd=5)
  fit <- lm(formula = as.formula(fm), weights = w, data=cars)
  plot(fitted(fit))
}

func2 <- function(fm, spd){
  w <- dnorm(cars$speed - spd, sd=5)
  fit <- lm(formula = fm, weights = w, data=cars)
  plot(fitted(fit))
}

func1("dist ~ speed", 30)
func2(dist ~ speed, 30)

func1 works. but func2 fails with following message:

Error in eval(expr, envir, enclos) : object 'w' not found

The only difference between two functions is that func2 receives formula class as argument.

Using lm() of R in this style, a formula object should be passed as character?

I tested this with R-3.2.1, RStudio 0.99.467, Windows7.


回答1:


Very interesting case! This relates deeply to the environment feature of R. In short, it seems we should not pass a formula objects defined outside into a function. Although there are some ways to tweak around, the behavior may surprise us.

?formula says:

A formula object has an associated environment, and this environment (rather than the parent environment) is used by model.frame to evaluate variables that are not found in the supplied data argument.

In your func1, the formula is generated inside the function, hence it is associated with the function environment (function forms an environment). Hence, when objects are not found in data, the lm call looks for them in the function environment. That is how w is found in func1.

In the second example, the formula is defined outside the function, or more precisely, in the global environment. Hence the formula looks for objects in the global if not found in the data. Since there is no w in the global, it fails. What could be worse is that if you have another w in the global, this w would be confused and used as the weight.

Here is an example that highlights the order of object search. The data only has y. Hence lm call looks for x elsewhere. But there are two x. fm, formula defined in the global finds x = 1:10, while as.formula(ch), defined in the function, finds x = 10:1. environment tells you which environment the formula is associated with.

fun <- function(fm, ch) {
  x <- 10:1
  dat <- data.frame(y = 1:10)

  print(environment(fm))
  print(lm(fm, data = dat))
  cat("<--- refers to x in the global\n") 

  print(environment(as.formula(ch)))
  print(lm(as.formula(ch), data = dat))
  cat("<--- refers to x in the function\n\n")
}

x <- c(1:10)
fun(y ~ x, "y ~ x")

See also: Environments - Advanced R.



来源:https://stackoverflow.com/questions/34452369/using-lm-of-r-a-formula-object-should-be-passed-as-character

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!