How to feed a list of unquoted column names into `lapply` (so that I can use it with a `dplyr` function)

前端 未结 3 1810
陌清茗
陌清茗 2021-02-08 14:33

I am trying to write a function in tidyverse/dplyr that I want to eventually use with lapply (or map). (I had been working on it to answer

相关标签:
3条回答
  • 2021-02-08 14:57

    as.name will convert a string to a name and that can be passed to report:

    lapply(cat.list, function(x) do.call("report", list(as.name(x))))
    

    character argument An alternative is to rewrite report so that it accepts a character string argument:

    report_ch <- function(colname) {  
        report_cat <- rlang::sym(colname)   # as.name(colname) would also work here
        sample_data %>%
                    group_by(!!report_cat, YEAR) %>%
                    summarize(num = n(), total = sum(AMOUNT)) %>% 
                    rename(REPORT_VALUE = !!report_cat) %>% 
                    mutate(REPORT_CATEGORY = colname)
    }
    
    lapply(cat.list, report_ch)
    

    wrapr An alternate approach is to rewrite report using the wrapr package which is an alternative to rlang/tidyeval:

    library(dplyr)
    library(wrapr)
    
    report_wrapr <- function(colname) 
      let(c(COLNAME = colname),
          sample_data %>%
                      group_by(COLNAME, YEAR) %>%
                      summarize(num = n(), total = sum(AMOUNT)) %>%
                      rename(REPORT_VALUE = COLNAME) %>%
                      mutate(REPORT_CATEGORY = colname)
       )
    
    lapply(cat.list, report_wrapr)
    

    Of course, this whole problem would go away if you used a different framework, e.g.

    plyr

    library(plyr)
    
    report_plyr <- function(colname)
      ddply(sample_data, c(REPORT_VALUE = colname, "YEAR"), function(x)
         data.frame(num = nrow(x), total = sum(x$AMOUNT), REPORT_CATEOGRY = colname))
    
    lapply(cat.list, report_plyr)
    

    sqldf

    library(sqldf)
    
    report_sql <- function(colname, envir = parent.frame(), ...)
      fn$sqldf("select [$colname] REPORT_VALUE,
                       YEAR,
                       count(*) num,
                       sum(AMOUNT) total,
                       '$colname' REPORT_CATEGORY
                from sample_data
                group by [$colname], YEAR", envir = envir, ...)
    
    lapply(cat.list, report_sql)              
    

    base - by

    report_base_by <- function(colname)
          do.call("rbind", 
            by(sample_data, sample_data[c(colname, "YEAR")], function(x)
                data.frame(REPORT_VALUE = x[1, colname], 
                           YEAR = x$YEAR[1], 
                           num = nrow(x), 
                           total = sum(x$AMOUNT), 
                           REPORT_CATEGORY = colname)
             )
          )
    
    lapply(cat.list, report_base_by)
    

    data.table The data.table package provides another alternative but that has already been covered by another answer.

    Update: Added additional alternatives.

    0 讨论(0)
  • 2021-02-08 15:03

    Let me first point out that in your initial report function, you can use quo_name to convert the quosure into a string, which you can then use in mutate like the following:

    library(dplyr)
    library(rlang)
    
    report <- function(report_cat){
      report_cat <- enquo(report_cat)
    
      sample_data %>%
        group_by(!!report_cat, YEAR) %>%
        summarize(num=n(),total=sum(AMOUNT)) %>%
        rename(REPORT_VALUE = !!report_cat) %>%
        mutate(REPORT_CATEGORY = quo_name(report_cat))
    }
    
    report(REPORT_CODE)
    

    Now, to address your question of "how to feed a list of unquoted strings through lapply or map to make it work inside dplyr functions", I propose two ways of doing it.

    1. Use rlang::sym to parse your strings and unquote it when feeding into lapply or map

    library(purrr)
    
    cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
    
    map_df(cat.list, ~report(!!sym(.)))    
    

    or with syms you can parse all elements of a vector at once:

    map_df(syms(cat.list), ~report(!!.))
    

    Result:

    # A tibble: 27 x 5
    # Groups:   REPORT_VALUE [16]
       REPORT_VALUE  YEAR   num total REPORT_CATEGORY
              <chr> <chr> <int> <int>           <chr>
     1            J  FY14     1    25     REPORT_CODE
     2            Q  FY16     1     1     REPORT_CODE
     3            Q  FY17     1   100     REPORT_CODE
     4            R  FY17     1    50     REPORT_CODE
     5            R  FY18     2    75     REPORT_CODE
     6            S  FY17     2   400     REPORT_CODE
     7            S  FY18     2   530     REPORT_CODE
     8        Check  FY14     1    25  PAYMENT_METHOD
     9        Check  FY17     1    50  PAYMENT_METHOD
    10        Check  FY18     2    55  PAYMENT_METHOD
    # ... with 17 more rows 
    

    2. Rewrite your report function by placing lapply or map inside so that report can do NSE

    report <- function(...){
      report_cat <- quos(...)
    
      map_df(report_cat, function(x) sample_data %>%
                 group_by(!!x, YEAR) %>%
                 summarize(num=n(),total=sum(AMOUNT)) %>%
                 rename(REPORT_VALUE = !!x) %>%
                 mutate(REPORT_CATEGORY = quo_name(x)))
    }
    

    By placing map_df inside report, you can take advantage of quos, which converts ... to list of quosures. They are then fed into map_df and unquoted one by one using !!.

    report(REPORT_CODE, PAYMENT_METHOD, INBOUND_CHANNEL, AMOUNT_CAT)
    

    Another advantage of writing it like this is that you can also supply a vector of string symbols and splice them using !!! like the following:

    report(!!!syms(cat.list))
    

    Result:

    # A tibble: 27 x 5
    # Groups:   REPORT_VALUE [16]
       REPORT_VALUE  YEAR   num total REPORT_CATEGORY
              <chr> <chr> <int> <int>           <chr>
     1            J  FY14     1    25     REPORT_CODE
     2            Q  FY16     1     1     REPORT_CODE
     3            Q  FY17     1   100     REPORT_CODE
     4            R  FY17     1    50     REPORT_CODE
     5            R  FY18     2    75     REPORT_CODE
     6            S  FY17     2   400     REPORT_CODE
     7            S  FY18     2   530     REPORT_CODE
     8        Check  FY14     1    25  PAYMENT_METHOD
     9        Check  FY17     1    50  PAYMENT_METHOD
    10        Check  FY18     2    55  PAYMENT_METHOD
    # ... with 17 more rows
    
    0 讨论(0)
  • 2021-02-08 15:05

    I'm not really a dplyr afficionado, but for what its worth here is how you could achieve this using library(data.table) instead:

    setDT(sample_data)
    
    gen_report <- function(report_cat){
      sample_data[ , .(num = .N, total = sum(AMOUNT), REPORT_CATEGORY = report_cat), 
                   by = .(REPORT_VALUE = get(report_cat), YEAR)] 
    }
    
    gen_report('REPORT_CODE')
    lapply(cat.list, gen_report)
    
    0 讨论(0)
提交回复
热议问题