Let\'s say I have the following formula:
myformula<-formula(\"depVar ~ Var1 + Var2\")
How to reliably get dependent variable name from formu
Based on your edit to get the actual response, not just its name, we can use the nonstandard evaluation idiom employed by lm()
and most other modelling functions with a formula interface in base R
form <- formula("depVar ~ Var1 + Var2")
dat <- data.frame(depVar = rnorm(10), Var1 = rnorm(10), Var2 = rnorm(10))
getResponse <- function(form, data) {
mf <- match.call(expand.dots = FALSE)
m <- match(c("formula", "data"), names(mf), 0L)
mf <- mf[c(1L, m)]
mf$drop.unused.levels <- TRUE
mf[[1L]] <- as.name("model.frame")
mf <- eval(mf, parent.frame())
y <- model.response(mf, "numeric")
y
}
> getResponse(form, dat)
1 2 3 4 5
-0.02828573 -0.41157817 2.45489291 1.39035938 -0.31267835
6 7 8 9 10
-0.39945771 -0.09141438 0.81826105 0.37448482 -0.55732976
As you see, this gets the actual response variable data from the supplied data frame.
How this works is that the function first captures the function call without expanding the ...
argument as that contains things not needed for the evaluation of the data for the formula.
Next, the "formula"
and "data"
arguments are matched with the call. The line mf[c(1L, m)]
selects the function name from the call (1L
) and the locations of the two matched arguments. The drop.unused.levels
argument of model.frame()
is set to TRUE
in the next line, and then the call is updated to switch the function name in the call from lm
to model.frame
. All the above code does is takes the call to lm()
and processes that call into a call to the model.frame()
function.
This modified call is then evaluated in the parent environment of the function - which in this case is the global environment.
The last line uses the model.response()
extractor function to take the response variable from the model frame.