Grouping Over All Possible Combinations of Several Variables With dplyr

最后都变了- 提交于 2019-12-18 16:34:35

问题


Given a situation such as the following

library(dplyr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                             var2 = letters[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var3 = LETTERS[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor(), 
                             var4 = month.abb[1:3] %>%
                                    sample(100, replace = TRUE) %>%
                                    factor()))

I would like to group `myData' to eventually find summary data grouping by all possible combinations of var2, var3, and var4.

I can create a list with all possible combinations of variables as character values with

groupNames <- names(myData)[2:4]

myGroups <- Map(combn, 
              list(groupNames), 
              seq_along(groupNames),
              simplify = FALSE) %>%
              unlist(recursive = FALSE)

My plan was to make separate data sets for each variable combination with a for() loop, something like

### This Does Not Work
for (i in 1:length(myGroups)){
     assign( myGroups[i]%>%
             unlist() %>%
             paste0(collapse = "")%>%
             paste0("Data"), 
               myData %>% 
               group_by_(lapply(myGroups[[i]], as.symbol)) %>%
               summarise( n = length(var1), 
                             avgVar2 = var2 %>%
                                       mean()))
}

Admittedly I am not very good with lists, and looking up this issue was a bit challenging since dpyr updates have altered how grouping works a bit.

If there is a better way to do this than separate data sets I would love to know.

I've gotten a loop similar to above working when I am only grouping by a single variable.

Any and all help is greatly appreciated! Thank you!


回答1:


This seems convulated, and there's probably a way to simplify or fancy it up with a do, but it works. Using your myData and myGroups,

results = lapply(myGroups, FUN = function(x) {
    do.call(what = group_by_, args = c(list(myData), x)) %>%
        summarise( n = length(var1), 
                   avgVar1 = mean(var1))
    }
)

> results[[1]]
Source: local data frame [3 x 3]

  var2  n     avgVar1
1    a 31  0.38929738
2    b 31 -0.07451717
3    c 38 -0.22522129

> results[[4]]
Source: local data frame [9 x 4]
Groups: var2

  var2 var3  n    avgVar1
1    a    A 11 -0.1159160
2    a    B 11  0.5663312
3    a    C  9  0.7904056
4    b    A  7  0.0856384
5    b    B 13  0.1309756
6    b    C 11 -0.4192895
7    c    A 15 -0.2783099
8    c    B 10 -0.1110877
9    c    C 13 -0.2517602

> results[[7]]
# I won't paste them here, but it has all 27 rows, grouped by var2, var3 and var4.

I changed your summarise call to average var1 since var2 isn't numeric.




回答2:


I have created a function based on the answer of @Gregor and the comments that followed:

library(magrittr)
myData <- tbl_df(data.frame( var1 = rnorm(100), 
                         var2 = letters[1:3] %>%
                                sample(100, replace = TRUE) %>%
                                factor(), 
                         var3 = LETTERS[1:3] %>%
                                sample(100, replace = TRUE) %>%
                                factor(), 
                         var4 = month.abb[1:3] %>%
                                sample(100, replace = TRUE) %>%
                                factor()))

Function combSummarise

combSummarise <- function(data, variables=..., summarise=...){


  # Get all different combinations of selected variables (credit to @Michael)
    myGroups <- lapply(seq_along(variables), function(x) {
    combn(c(variables), x, simplify = FALSE)}) %>%
    unlist(recursive = FALSE)

  # Group by selected variables (credit to @konvas)
    df <- eval(parse(text=paste("lapply(myGroups, function(x){
               dplyr::group_by_(data, .dots=x) %>% 
               dplyr::summarize_( \"", paste(summarise, collapse="\",\""),"\")})"))) %>% 
          do.call(plyr::rbind.fill,.)

    groupNames <- c(myGroups[[length(myGroups)]])
    newNames <- names(df)[!(names(df) %in% groupNames)]

    df <- cbind(df[, groupNames], df[, newNames])
    names(df) <- c(groupNames, newNames)
    df

}

Call of combSummarise

combSummarise (myData, var=c("var2", "var3", "var4"), 
               summarise=c("length(var1)", "mean(var1)", "max(var1)"))

or

combSummarise (myData, var=c("var2", "var4"), 
               summarise=c("length(var1)", "mean(var1)", "max(var1)"))

or

combSummarise (myData, var=c("var2", "var4"), 
           summarise=c("length(var1)"))

etc




回答3:


Inspired by the answers by Gregor and dimitris_ps, I wrote a dplyr style function that runs summarise for all combinations of group variables.

summarise_combo <- function(data, ...) {

  groupVars <- group_vars(data) %>% map(as.name)

  groupCombos <-  map( 0:length(groupVars), ~combn(groupVars, ., simplify=FALSE) ) %>%
    unlist(recursive = FALSE)

  results <- groupCombos %>% 
    map(function(x) {data %>% group_by(!!! x) %>% summarise(...)} ) %>%
    bind_rows()

  results %>% select(!!! groupVars, everything())
}

Example

library(tidyverse)
mtcars %>% group_by(cyl, vs) %>% summarise_combo(cyl_n = n(), mean(mpg))


来源:https://stackoverflow.com/questions/28992028/grouping-over-all-possible-combinations-of-several-variables-with-dplyr

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!