How to reliably get dependent variable name from formula object?

前端 未结 7 871
清酒与你
清酒与你 2021-01-31 02:29

Let\'s say I have the following formula:

myformula<-formula(\"depVar ~ Var1 + Var2\")

How to reliably get dependent variable name from formu

相关标签:
7条回答
  • 2021-01-31 03:10

    Try using all.vars:

    all.vars(myformula)[1]
    
    0 讨论(0)
  • 2021-01-31 03:15

    I know this question is quite old, but I thought I'd add a base R answer which doesn't require indexing, doesn't depend on the order of the variables listed in a call to all.vars, and which gives the response variables as separate elements when there is more than one:

    myformula <- formula("depVar1 + depVar2 ~ Var1 + Var2")
    all_vars <- all.vars(myformula)
    response <- all_vars[!(all_vars %in% labels(terms(myformula)))]
    
    > response
    [1] "depVar1" "depVar2"
    
    0 讨论(0)
  • 2021-01-31 03:17

    Using all.vars is very tricky as it won't detect the response from a one-sided formula. For example

    all.vars(~x+1)
    [1] "x"
    

    that is wrong.

    Here is the most reliable way of getting the response:

        getResponseFromFormula = function(formula) {
            if (attr(terms(as.formula(formula))    , which = 'response'))
                all.vars(formula)[1]
            else
                NULL
        }
    
    
    getResponseFromFormula(~x+1)
    NULL
    
     getResponseFromFormula(y~x+1)
    [1] "y"
    

    Note that you can replace all.vars(formula)[1] in the function with formula[2] if the formula contains more than one variable for the response.

    0 讨论(0)
  • 2021-01-31 03:18

    Based on your edit to get the actual response, not just its name, we can use the nonstandard evaluation idiom employed by lm() and most other modelling functions with a formula interface in base R

    form <- formula("depVar ~ Var1 + Var2")
    dat <- data.frame(depVar = rnorm(10), Var1 = rnorm(10), Var2 = rnorm(10))
    
    getResponse <- function(form, data) {
        mf <- match.call(expand.dots = FALSE)
        m <- match(c("formula", "data"), names(mf), 0L)
        mf <- mf[c(1L, m)]
        mf$drop.unused.levels <- TRUE
        mf[[1L]] <- as.name("model.frame")
        mf <- eval(mf, parent.frame())
        y <- model.response(mf, "numeric")
        y
    } 
    
    > getResponse(form, dat)
              1           2           3           4           5 
    -0.02828573 -0.41157817  2.45489291  1.39035938 -0.31267835 
              6           7           8           9          10 
    -0.39945771 -0.09141438  0.81826105  0.37448482 -0.55732976
    

    As you see, this gets the actual response variable data from the supplied data frame.

    How this works is that the function first captures the function call without expanding the ... argument as that contains things not needed for the evaluation of the data for the formula.

    Next, the "formula" and "data" arguments are matched with the call. The line mf[c(1L, m)] selects the function name from the call (1L) and the locations of the two matched arguments. The drop.unused.levels argument of model.frame() is set to TRUE in the next line, and then the call is updated to switch the function name in the call from lm to model.frame. All the above code does is takes the call to lm() and processes that call into a call to the model.frame() function.

    This modified call is then evaluated in the parent environment of the function - which in this case is the global environment.

    The last line uses the model.response() extractor function to take the response variable from the model frame.

    0 讨论(0)
  • 2021-01-31 03:20

    I suppose you could also cook your own function to work with terms():

    getResponse <- function(formula) {
        tt <- terms(formula)
        vars <- as.character(attr(tt, "variables"))[-1] ## [1] is the list call
        response <- attr(tt, "response") # index of response var
        vars[response] 
    }
    
    R> myformula <- formula("depVar ~ Var1 + Var2")
    R> getResponse(myformula)
    [1] "depVar"
    

    It is just as hacky as as.character(myformyula)[[2]] but you have the assurance that you get the correct variable as the ordering of the call parse tree isn't going to change any time soon.

    This isn't so good with multiple dependent variables:

    R> myformula <- formula("depVar1 + depVar2 ~ Var1 + Var2")
    R> getResponse(myformula)
    [1] "depVar1 + depVar2"
    

    as they'll need further processing.

    0 讨论(0)
  • 2021-01-31 03:27

    I found an useful package 'formula.tools' which is suitable for your task.

    code Example:

    f <- as.formula(a1 + a2~a3 + a4)

    lhs.vars(f) #get dependent variables

    [1] "a1" "a2"

    rhs.vars(f) #get independent variables

    [1] "a3" "a4"

    0 讨论(0)
提交回复
热议问题