I am trying to write a function in tidyverse/dplyr
that I want to eventually use with lapply
(or map
). (I had been working on it to answer
as.name
will convert a string to a name and that can be passed to report
:
lapply(cat.list, function(x) do.call("report", list(as.name(x))))
character argument An alternative is to rewrite report
so that it accepts a character string argument:
report_ch <- function(colname) {
report_cat <- rlang::sym(colname) # as.name(colname) would also work here
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num = n(), total = sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY = colname)
}
lapply(cat.list, report_ch)
wrapr An alternate approach is to rewrite report
using the wrapr package which is an alternative to rlang/tidyeval:
library(dplyr)
library(wrapr)
report_wrapr <- function(colname)
let(c(COLNAME = colname),
sample_data %>%
group_by(COLNAME, YEAR) %>%
summarize(num = n(), total = sum(AMOUNT)) %>%
rename(REPORT_VALUE = COLNAME) %>%
mutate(REPORT_CATEGORY = colname)
)
lapply(cat.list, report_wrapr)
Of course, this whole problem would go away if you used a different framework, e.g.
plyr
library(plyr)
report_plyr <- function(colname)
ddply(sample_data, c(REPORT_VALUE = colname, "YEAR"), function(x)
data.frame(num = nrow(x), total = sum(x$AMOUNT), REPORT_CATEOGRY = colname))
lapply(cat.list, report_plyr)
sqldf
library(sqldf)
report_sql <- function(colname, envir = parent.frame(), ...)
fn$sqldf("select [$colname] REPORT_VALUE,
YEAR,
count(*) num,
sum(AMOUNT) total,
'$colname' REPORT_CATEGORY
from sample_data
group by [$colname], YEAR", envir = envir, ...)
lapply(cat.list, report_sql)
base - by
report_base_by <- function(colname)
do.call("rbind",
by(sample_data, sample_data[c(colname, "YEAR")], function(x)
data.frame(REPORT_VALUE = x[1, colname],
YEAR = x$YEAR[1],
num = nrow(x),
total = sum(x$AMOUNT),
REPORT_CATEGORY = colname)
)
)
lapply(cat.list, report_base_by)
data.table The data.table package provides another alternative but that has already been covered by another answer.
Update: Added additional alternatives.
Let me first point out that in your initial report
function, you can use quo_name
to convert the quosure into a string, which you can then use in mutate
like the following:
library(dplyr)
library(rlang)
report <- function(report_cat){
report_cat <- enquo(report_cat)
sample_data %>%
group_by(!!report_cat, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!report_cat) %>%
mutate(REPORT_CATEGORY = quo_name(report_cat))
}
report(REPORT_CODE)
Now, to address your question of "how to feed a list of unquoted strings through lapply
or map
to make it work inside dplyr
functions", I propose two ways of doing it.
rlang::sym
to parse your strings and unquote it when feeding into lapply
or map
library(purrr)
cat.list <- c("REPORT_CODE","PAYMENT_METHOD","INBOUND_CHANNEL","AMOUNT_CAT")
map_df(cat.list, ~report(!!sym(.)))
or with syms
you can parse all elements of a vector at once:
map_df(syms(cat.list), ~report(!!.))
Result:
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <chr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
report
function by placing lapply
or map
inside so that report
can do NSEreport <- function(...){
report_cat <- quos(...)
map_df(report_cat, function(x) sample_data %>%
group_by(!!x, YEAR) %>%
summarize(num=n(),total=sum(AMOUNT)) %>%
rename(REPORT_VALUE = !!x) %>%
mutate(REPORT_CATEGORY = quo_name(x)))
}
By placing map_df
inside report
, you can take advantage of quos
, which converts ...
to list of quosures. They are then fed into map_df
and unquoted one by one using !!
.
report(REPORT_CODE, PAYMENT_METHOD, INBOUND_CHANNEL, AMOUNT_CAT)
Another advantage of writing it like this is that you can also supply a vector of string symbols and splice them using !!!
like the following:
report(!!!syms(cat.list))
Result:
# A tibble: 27 x 5
# Groups: REPORT_VALUE [16]
REPORT_VALUE YEAR num total REPORT_CATEGORY
<chr> <chr> <int> <int> <chr>
1 J FY14 1 25 REPORT_CODE
2 Q FY16 1 1 REPORT_CODE
3 Q FY17 1 100 REPORT_CODE
4 R FY17 1 50 REPORT_CODE
5 R FY18 2 75 REPORT_CODE
6 S FY17 2 400 REPORT_CODE
7 S FY18 2 530 REPORT_CODE
8 Check FY14 1 25 PAYMENT_METHOD
9 Check FY17 1 50 PAYMENT_METHOD
10 Check FY18 2 55 PAYMENT_METHOD
# ... with 17 more rows
I'm not really a dplyr afficionado, but for what its worth here is how you could achieve this using library(data.table)
instead:
setDT(sample_data)
gen_report <- function(report_cat){
sample_data[ , .(num = .N, total = sum(AMOUNT), REPORT_CATEGORY = report_cat),
by = .(REPORT_VALUE = get(report_cat), YEAR)]
}
gen_report('REPORT_CODE')
lapply(cat.list, gen_report)