How to reliably get dependent variable name from formula object?

前端未结

关注

 7  851

Let\'s say I have the following formula:

myformula<-formula(\"depVar ~ Var1 + Var2\")

How to reliably get dependent variable name from formu

相关标签:

7条回答

刺人心

2021-01-31 03:10
Try using all.vars:
```
all.vars(myformula)[1]
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2021-01-31 03:15
I know this question is quite old, but I thought I'd add a base R answer which doesn't require indexing, doesn't depend on the order of the variables listed in a call to all.vars, and which gives the response variables as separate elements when there is more than one:
```
myformula <- formula("depVar1 + depVar2 ~ Var1 + Var2")
all_vars <- all.vars(myformula)
response <- all_vars[!(all_vars %in% labels(terms(myformula)))]

> response
[1] "depVar1" "depVar2"
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
北恋

2021-01-31 03:17
Using all.vars is very tricky as it won't detect the response from a one-sided formula. For example
```
all.vars(~x+1)
[1] "x"
```
that is wrong.

Here is the most reliable way of getting the response:
```
    getResponseFromFormula = function(formula) {
        if (attr(terms(as.formula(formula))    , which = 'response'))
            all.vars(formula)[1]
        else
            NULL
    }


getResponseFromFormula(~x+1)
NULL

 getResponseFromFormula(y~x+1)
[1] "y"
```
Note that you can replace all.vars(formula)[1] in the function with formula[2] if the formula contains more than one variable for the response.
0 讨论(0)
发布评论:

提交评论
- 加载中...
面向向阳花

2021-01-31 03:18
Based on your edit to get the actual response, not just its name, we can use the nonstandard evaluation idiom employed by lm() and most other modelling functions with a formula interface in base R
```
form <- formula("depVar ~ Var1 + Var2")
dat <- data.frame(depVar = rnorm(10), Var1 = rnorm(10), Var2 = rnorm(10))

getResponse <- function(form, data) {
    mf <- match.call(expand.dots = FALSE)
    m <- match(c("formula", "data"), names(mf), 0L)
    mf <- mf[c(1L, m)]
    mf$drop.unused.levels <- TRUE
    mf[[1L]] <- as.name("model.frame")
    mf <- eval(mf, parent.frame())
    y <- model.response(mf, "numeric")
    y
} 

> getResponse(form, dat)
          1           2           3           4           5 
-0.02828573 -0.41157817  2.45489291  1.39035938 -0.31267835 
          6           7           8           9          10 
-0.39945771 -0.09141438  0.81826105  0.37448482 -0.55732976
```
As you see, this gets the actual response variable data from the supplied data frame.

How this works is that the function first captures the function call without expanding the ... argument as that contains things not needed for the evaluation of the data for the formula.

Next, the "formula" and "data" arguments are matched with the call. The line mf[c(1L, m)] selects the function name from the call (1L) and the locations of the two matched arguments. The drop.unused.levels argument of model.frame() is set to TRUE in the next line, and then the call is updated to switch the function name in the call from lm to model.frame. All the above code does is takes the call to lm() and processes that call into a call to the model.frame() function.

This modified call is then evaluated in the parent environment of the function - which in this case is the global environment.

The last line uses the model.response() extractor function to take the response variable from the model frame.
0 讨论(0)
发布评论:

提交评论
- 加载中...

抹茶落季

2021-01-31 03:20

I suppose you could also cook your own function to work with terms():

getResponse <- function(formula) {
    tt <- terms(formula)
    vars <- as.character(attr(tt, "variables"))[-1] ## [1] is the list call
    response <- attr(tt, "response") # index of response var
    vars[response] 
}

R> myformula <- formula("depVar ~ Var1 + Var2")
R> getResponse(myformula)
[1] "depVar"

It is just as hacky as as.character(myformyula)[[2]] but you have the assurance that you get the correct variable as the ordering of the call parse tree isn't going to change any time soon.

This isn't so good with multiple dependent variables:

R> myformula <- formula("depVar1 + depVar2 ~ Var1 + Var2")
R> getResponse(myformula)
[1] "depVar1 + depVar2"

as they'll need further processing.

0 讨论(0)

死守一世寂寞

2021-01-31 03:27

I found an useful package 'formula.tools' which is suitable for your task.

code Example:

f <- as.formula(a1 + a2~a3 + a4)

lhs.vars(f) #get dependent variables

[1] "a1" "a2"

rhs.vars(f) #get independent variables

[1] "a3" "a4"

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页