My question builds on a similar one by imposing an additional constraint that the name of each variable should appear only once.
Consider a data frame
Here's a hacky function that uses unexported functions from dplyr so it is not future proof, but you can specify a different summary for each column.
summarise_with <- function(.tbl, .funs) {
funs <- enquo(.funs)
syms <- syms(tbl_vars(.tbl))
calls <- dplyr:::as_fun_list(.funs, funs, caller_env())
stopifnot(length(syms)==length(calls))
cols <- purrr::map2(calls, syms, ~dplyr:::expr_substitute(.x, quote(.), .y))
cols <- purrr::set_names(cols, purrr::map_chr(syms, rlang::as_string))
summarize(.tbl, !!!cols)
}
Then you could do
df %>% summarise_with(list(mean, sum))
and not have to type the column names at all.
It seems like you can use map2
for this.
map2_dfc( df[v], f, ~.y(.x))
# # A tibble: 1 x 2
# potentially_long_name_i_dont_want_to_type_twice another_annoyingly_long_name
# <dbl> <int>
# 1 5.5 255
I propose 2 tricks to solve this issue, see the code and some details for both solutions at the bottom :
A function .at
that returns results for for groups of variables (here only one variable by group) that we can then unsplice, so we benefit from both worlds, summarize
and summarize_at
:
df %>% summarize(
!!!.at(vars(potentially_long_name_i_dont_want_to_type_twice), mean),
!!!.at(vars(another_annoyingly_long_name), sum))
# # A tibble: 1 x 2
# potentially_long_name_i_dont_want_to_type_twice another_annoyingly_long_name
# <dbl> <dbl>
# 1 5.5 255
An adverb to summarize
, with a dollar notation shorthand.
df %>%
..flx$summarize(potentially_long_name_i_dont_want_to_type_twice = ~mean(.),
another_annoyingly_long_name = ~sum(.))
# # A tibble: 1 x 2
# potentially_long_name_i_dont_want_to_type_twice another_annoyingly_long_name
# <dbl> <int>
# 1 5.5 255
code for .at
It has to be used in a pipe because it uses the .
in the parent environment, messy but it works.
.at <- function(.vars, .funs, ...) {
in_a_piped_fun <- exists(".",parent.frame()) &&
length(ls(envir=parent.frame(), all.names = TRUE)) == 1
if (!in_a_piped_fun)
stop(".at() must be called as an argument to a piped function")
.tbl <- try(eval.parent(quote(.)))
dplyr:::manip_at(
.tbl, .vars, .funs, rlang::enquo(.funs), rlang:::caller_env(),
.include_group_vars = TRUE, ...)
}
I designed it to combine summarize
and summarize_at
:
df %>% summarize(
!!!.at(vars(potentially_long_name_i_dont_want_to_type_twice), list(foo=min, bar = max)),
!!!.at(vars(another_annoyingly_long_name), median))
# # A tibble: 1 x 3
# foo bar another_annoyingly_long_name
# <dbl> <dbl> <dbl>
# 1 1 10 25.5
code for ..flx
..flx
outputs a function that replaces its formula arguments such as a = ~mean(.)
by calls a = purrr::as_mapper(~mean(.))(a)
before running. Convenient with summarize
and mutate
because a column cannot be a formula so there can't be any conflict.
I like to use the dollar notation as a shorthand and to have names starting with ..
so I can name those "tags" (and give them a class "tag"
) and see them as different objects (still experimenting with this). ..flx(summarize)(...)
will work as well though.
..flx <- function(fun){
function(...){
mc <- match.call()
mc[[1]] <- tail(mc[[1]],1)[[1]]
mc[] <- imap(mc,~if(is.call(.) && identical(.[[1]],quote(`~`))) {
rlang::expr(purrr::as_mapper(!!.)(!!sym(.y)))
} else .)
eval.parent(mc)
}
}
class(..flx) <- "tag"
`$.tag` <- function(e1, e2){
# change original call so x$y, which is `$.tag`(tag=x, data=y), becomes x(y)
mc <- match.call()
mc[[1]] <- mc[[2]]
mc[[2]] <- NULL
names(mc) <- NULL
# evaluate it in parent env
eval.parent(mc)
}
Use .[[i]]
and !!names(.)[i]:=
to refer to the ith column and its name.
library(tibble)
library(dplyr)
library(rlang)
df %>% summarize(!!names(.)[1] := mean(.[[1]]), !!names(.)[2] := sum(.[[2]]))
giving:
# A tibble: 1 x 2
potentially_long_name_i_dont_want_to_type_twice another_annoyingly_long_name
<dbl> <int>
1 5.5 255
If df were grouped (it is not in the question so this is not needed) then surround summarize
with a do
like this:
library(dplyr)
library(rlang)
library(tibble)
df2 <- tibble(a = 1:10, b = 11:20, g = rep(1:2, each = 5))
df2 %>%
group_by(g) %>%
do(summarize(., !!names(.)[1] := mean(.[[1]]), !!names(.)[2] := sum(.[[2]]))) %>%
ungroup
giving:
# A tibble: 2 x 3
g a b
<int> <dbl> <int>
1 1 3 65
2 2 8 90