Why is enquo + !! preferable to substitute + eval

前端 未结 4 442
灰色年华
灰色年华 2020-11-29 03:04

In the following example, why should we favour using f1 over f2? Is it more efficient in some sense? For someone used to base R, it seems more natu

相关标签:
4条回答
  • 2020-11-29 03:24

    I want to give an answer that is independent of dplyr, because there is a very clear advantage to using enquo over substitute. Both look in the calling environment of a function to identify the expression that was given to that function. The difference is that substitute() does it only once, while !!enquo() will correctly walk up the entire calling stack.

    Consider a simple function that uses substitute():

    f <- function( myExpr ) {
      eval( substitute(myExpr), list(a=2, b=3) )
    }
    
    f(a+b)   # 5
    f(a*b)   # 6
    

    This functionality breaks when the call is nested inside another function:

    g <- function( myExpr ) {
      val <- f( substitute(myExpr) )
      ## Do some stuff
      val
    }
    
    g(a+b)
    # myExpr     <-- OOPS
    

    Now consider the same functions re-written using enquo():

    library( rlang )
    
    f2 <- function( myExpr ) {
      eval_tidy( enquo(myExpr), list(a=2, b=3) )
    }
    
    g2 <- function( myExpr ) {
      val <- f2( !!enquo(myExpr) )
      val
    }
    
    g2( a+b )    # 5
    g2( b/a )    # 1.5
    

    And that is why enquo() + !! is preferable to substitute() + eval(). dplyr simply takes full advantage of this property to build a coherent set of NSE functions.

    UPDATE: rlang 0.4.0 introduced a new operator {{ (pronounced "curly curly"), which is effectively a short hand for !!enquo(). This allows us to simplify the definition of g2 to

    g2 <- function( myExpr ) {
      val <- f2( {{myExpr}} )
      val
    }
    
    0 讨论(0)
  • 2020-11-29 03:28

    To add some nuance, these things are not necessarily that complex in base R.

    It is important to remember to use eval.parent() when relevant to evaluate substituted arguments in the right environment, if you use eval.parent() properly the expression in nested calls will find their ways. If you don't you might discover environment hell :).

    The base tool box that I use is made of quote(), substitute(), bquote(), as.call(), and do.call() (the latter useful when used with substitute()

    Without going into details here is how to solve in base R the cases presented by @Artem and @Tung, without any tidy evaluation, and then the last example, not using quo / enquo, but still benefiting from splicing and unquoting (!!! and !!)

    We'll see that splicing and unquoting makes code nicer (but requires functions to support it!), and that in the present cases using quosures doesn't improve things dramatically (but still arguably does).

    solving Artem's case with base R

    f0 <- function( myExpr ) {
      eval(substitute(myExpr), list(a=2, b=3))
    }
    
    g0 <- function( myExpr ) {
      val <- eval.parent(substitute(f0(myExpr)))
      val
    }
    
    f0(a+b)
    #> [1] 5
    g0(a+b)
    #> [1] 5
    

    solving Tung's 1st case with base R

    my_summarise0 <- function(df, group_var, select_var) {
    
      group_var  <- substitute(group_var)
      select_var <- substitute(select_var)
    
      # create new name
      mean_name <- paste0("mean_", as.character(select_var))
    
      eval.parent(substitute(
      df %>%
        select(select_var, group_var) %>% 
        group_by(group_var) %>%
        summarise(mean_name := mean(select_var))))
    }
    
    library(dplyr)
    set.seed(1234)
    d = data.frame(x = c(1, 1, 2, 2, 3),
                   y = rnorm(5),
                   z = runif(5))
    my_summarise0(d, x, z)
    #> # A tibble: 3 x 2
    #>       x mean_z
    #>   <dbl>  <dbl>
    #> 1     1  0.619
    #> 2     2  0.603
    #> 3     3  0.292
    

    solving Tung's 2nd case with base R

    grouping_vars <- c(quote(x), quote(y))
    eval(as.call(c(quote(group_by), quote(d), grouping_vars))) %>%
      summarise(mean_z = mean(z))
    #> # A tibble: 5 x 3
    #> # Groups:   x [3]
    #>       x      y mean_z
    #>   <dbl>  <dbl>  <dbl>
    #> 1     1 -1.21   0.694
    #> 2     1  0.277  0.545
    #> 3     2 -2.35   0.923
    #> 4     2  1.08   0.283
    #> 5     3  0.429  0.292
    

    in a function:

    my_summarise02 <- function(df, select_var, ...) {
    
      group_var  <- eval(substitute(alist(...)))
      select_var <- substitute(select_var)
    
      # create new name
      mean_name <- paste0("mean_", as.character(select_var))
    
      df %>%
        {eval(as.call(c(quote(select),quote(.), select_var, group_var)))} %>% 
        {eval(as.call(c(quote(group_by),quote(.), group_var)))} %>%
        {eval(bquote(summarise(.,.(mean_name) := mean(.(select_var)))))}
    }
    
    my_summarise02(d, z, x, y)
    #> # A tibble: 5 x 3
    #> # Groups:   x [3]
    #>       x      y mean_z
    #>   <dbl>  <dbl>  <dbl>
    #> 1     1 -1.21   0.694
    #> 2     1  0.277  0.545
    #> 3     2 -2.35   0.923
    #> 4     2  1.08   0.283
    #> 5     3  0.429  0.292
    
    

    solving Tung's 2nd case with base R but using !! and !!!

    grouping_vars <- c(quote(x), quote(y))
    
    d %>%
      group_by(!!!grouping_vars) %>%
      summarise(mean_z = mean(z))
    #> # A tibble: 5 x 3
    #> # Groups:   x [3]
    #>       x      y mean_z
    #>   <dbl>  <dbl>  <dbl>
    #> 1     1 -1.21   0.694
    #> 2     1  0.277  0.545
    #> 3     2 -2.35   0.923
    #> 4     2  1.08   0.283
    #> 5     3  0.429  0.292
    

    in a function :

    my_summarise03 <- function(df, select_var, ...) {
    
      group_var  <- eval(substitute(alist(...)))
      select_var <- substitute(select_var)
    
      # create new name
      mean_name <- paste0("mean_", as.character(select_var))
    
      df %>%
        select(!!select_var, !!!group_var) %>% 
        group_by(!!!group_var) %>%
        summarise(.,!!mean_name := mean(!!select_var))
    }
    
    my_summarise03(d, z, x, y)
    #> # A tibble: 5 x 3
    #> # Groups:   x [3]
    #>       x      y mean_z
    #>   <dbl>  <dbl>  <dbl>
    #> 1     1 -1.21   0.694
    #> 2     1  0.277  0.545
    #> 3     2 -2.35   0.923
    #> 4     2  1.08   0.283
    #> 5     3  0.429  0.292
    
    
    0 讨论(0)
  • 2020-11-29 03:31

    enquo() and !! also allows you to program with other dplyr verbs such as group_by and select. I'm not sure if substitute and eval can do that. Take a look at this example where I modify your data frame a little bit

    library(dplyr)
    
    set.seed(1234)
    d = data.frame(x = c(1, 1, 2, 2, 3),
                   y = rnorm(5),
                   z = runif(5))
    
    # select, group_by & create a new output name based on input supplied
    my_summarise <- function(df, group_var, select_var) {
    
      group_var <- enquo(group_var)
      select_var <- enquo(select_var)
    
      # create new name
      mean_name <- paste0("mean_", quo_name(select_var))
    
      df %>%
        select(!!select_var, !!group_var) %>% 
        group_by(!!group_var) %>%
        summarise(!!mean_name := mean(!!select_var))
    }
    
    my_summarise(d, x, z)
    
    # A tibble: 3 x 2
          x mean_z
      <dbl>  <dbl>
    1    1.  0.619
    2    2.  0.603
    3    3.  0.292
    

    Edit: also enquos & !!! make it easier to capture list of variables

    # example
    grouping_vars <- quos(x, y)
    d %>%
      group_by(!!!grouping_vars) %>%
      summarise(mean_z = mean(z))
    
    # A tibble: 5 x 3
    # Groups:   x [?]
          x      y mean_z
      <dbl>  <dbl>  <dbl>
    1    1. -1.21   0.694
    2    1.  0.277  0.545
    3    2. -2.35   0.923
    4    2.  1.08   0.283
    5    3.  0.429  0.292
    
    
    # in a function
    my_summarise2 <- function(df, select_var, ...) {
    
      group_var <- enquos(...)
      select_var <- enquo(select_var)
    
      # create new name
      mean_name <- paste0("mean_", quo_name(select_var))
    
      df %>%
        select(!!select_var, !!!group_var) %>% 
        group_by(!!!group_var) %>%
        summarise(!!mean_name := mean(!!select_var))
    }
    
    my_summarise2(d, z, x, y)
    
    # A tibble: 5 x 3
    # Groups:   x [?]
          x      y mean_z
      <dbl>  <dbl>  <dbl>
    1    1. -1.21   0.694
    2    1.  0.277  0.545
    3    2. -2.35   0.923
    4    2.  1.08   0.283
    5    3.  0.429  0.292
    

    Credit: Programming with dplyr

    0 讨论(0)
  • 2020-11-29 03:43

    Imagine there is a different x you want to multiply:

    > x <- 3
    > f1(d, !!x)
      x            y two_y
    1 1 -2.488894875     6
    2 2 -1.133517746     6
    3 3 -1.024834108     6
    4 4  0.730537366     6
    5 5 -1.325431756     6
    

    vs without the !!:

    > f1(d, x)
      x            y two_y
    1 1 -2.488894875     2
    2 2 -1.133517746     4
    3 3 -1.024834108     6
    4 4  0.730537366     8
    5 5 -1.325431756    10
    

    !! gives you more control over scoping than substitute - with substitute you can only get the 2nd way easily.

    0 讨论(0)
提交回复
热议问题