Succinct way to summarize different columns with different functions

后端 未结 4 1298
既然无缘
既然无缘 2021-01-13 15:21

My question builds on a similar one by imposing an additional constraint that the name of each variable should appear only once.

Consider a data frame



        
4条回答
  •  爱一瞬间的悲伤
    2021-01-13 15:52

    I propose 2 tricks to solve this issue, see the code and some details for both solutions at the bottom :

    A function .at that returns results for for groups of variables (here only one variable by group) that we can then unsplice, so we benefit from both worlds, summarize and summarize_at :

    df %>% summarize(
      !!!.at(vars(potentially_long_name_i_dont_want_to_type_twice), mean),
      !!!.at(vars(another_annoyingly_long_name), sum))
    
    # # A tibble: 1 x 2
    #     potentially_long_name_i_dont_want_to_type_twice another_annoyingly_long_name
    #                                                                       
    #   1                                             5.5                          255
    

    An adverb to summarize, with a dollar notation shorthand.

    df %>%
      ..flx$summarize(potentially_long_name_i_dont_want_to_type_twice = ~mean(.),
                      another_annoyingly_long_name = ~sum(.))
    
    # # A tibble: 1 x 2
    #     potentially_long_name_i_dont_want_to_type_twice another_annoyingly_long_name
    #                                                                       
    #   1                                             5.5                          255
    

    code for .at

    It has to be used in a pipe because it uses the . in the parent environment, messy but it works.

    .at <- function(.vars, .funs, ...) {
      in_a_piped_fun <- exists(".",parent.frame()) &&
        length(ls(envir=parent.frame(), all.names = TRUE)) == 1
      if (!in_a_piped_fun)
        stop(".at() must be called as an argument to a piped function")
      .tbl <- try(eval.parent(quote(.)))
      dplyr:::manip_at(
        .tbl, .vars, .funs, rlang::enquo(.funs), rlang:::caller_env(),
        .include_group_vars = TRUE, ...)
    }
    

    I designed it to combine summarize and summarize_at :

    df %>% summarize(
      !!!.at(vars(potentially_long_name_i_dont_want_to_type_twice), list(foo=min, bar = max)),
      !!!.at(vars(another_annoyingly_long_name), median))
    
    # # A tibble: 1 x 3
    #       foo   bar another_annoyingly_long_name
    #                              
    #   1     1    10                         25.5
    

    code for ..flx

    ..flx outputs a function that replaces its formula arguments such as a = ~mean(.) by calls a = purrr::as_mapper(~mean(.))(a) before running. Convenient with summarize and mutate because a column cannot be a formula so there can't be any conflict.

    I like to use the dollar notation as a shorthand and to have names starting with .. so I can name those "tags" (and give them a class "tag") and see them as different objects (still experimenting with this). ..flx(summarize)(...) will work as well though.

    ..flx <- function(fun){
      function(...){
        mc <- match.call()
        mc[[1]] <- tail(mc[[1]],1)[[1]]
        mc[] <- imap(mc,~if(is.call(.) && identical(.[[1]],quote(`~`))) {
          rlang::expr(purrr::as_mapper(!!.)(!!sym(.y))) 
        } else .)
        eval.parent(mc)
      }
    }
    
    class(..flx) <- "tag"
    
    `$.tag` <- function(e1, e2){
      # change original call so x$y, which is `$.tag`(tag=x, data=y), becomes x(y)
      mc <- match.call()
      mc[[1]] <- mc[[2]]
      mc[[2]] <- NULL
      names(mc) <- NULL
      # evaluate it in parent env
      eval.parent(mc)
    }
    

提交回复
热议问题