R aggregate by large number of columns

后端 未结 4 597
青春惊慌失措
青春惊慌失措 2020-12-20 08:04

I have a data frame (df) that has about 40 columns, and I want to aggregate using a sum on 4 of the columns. Outside of the 4 I want to sum, each unique value in column 1 co

相关标签:
4条回答
  • 2020-12-20 08:17

    This would be the current answer with dplyr:

    library('dplyr')
    mytb<-read.table(text="
    A B C D Sum
    1 A B C D   1
    2 A B C D   2
    3 A B C D   3
    4 E F 1 R   4
    5 E F 1 R   5", header=T, stringsAsFactors=F)
    
    mytb %>% 
      group_by_at(names(select(mytb, -"Sum") ) )  %>% 
      summarise_all(.funs=sum)
    
    0 讨论(0)
  • 2020-12-20 08:19

    Using the example data mentioned by @josilber, this would be another option to achieve the desired output using dplyr() which is more efficient for huge datasets

    library('dplyr')
    
    out = agg %>% 
    regroup(lapply(names(select(agg, -sum)), as.symbol)) %>% 
    summarise_each(funs(sum))
    
    Source: local data frame [27 x 3]
    Groups: Species
    
    #  Species Petal.Width   sum
    #1      setosa         0.1  47.8
    #2      setosa         0.2 284.1
    #3      setosa         0.3  68.1
    #4      setosa         0.4  74.6
    #5      setosa         0.5  10.1
    #6      setosa         0.6  10.1
    #7  versicolor         1.0  79.9
    #8  versicolor         1.1  34.3
    #9  versicolor         1.2  63.8
    #10 versicolor         1.3 166.5
    #..        ...         ...   ...
    

    using data.table

    library('data.table')
    
    out = setDT(agg)[, list(sum = sum(sum)), by= names(agg[,!"sum", with=FALSE])]
    
    #  Species Petal.Width   sum
    #1:     setosa         0.2 284.1
    #2:     setosa         0.4  74.6
    #3:     setosa         0.3  68.1
    #4:     setosa         0.1  47.8
    #5:     setosa         0.5  10.1
    #6:     setosa         0.6  10.1
    #7: versicolor         1.4  96.7
    #8: versicolor         1.5 136.5
    #9: versicolor         1.3 166.5
    #10:versicolor         1.6  42.0
    # ...
    
    0 讨论(0)
  • 2020-12-20 08:26

    Use the data.frame method (aggregate.data.frame) like this:

    aggregate(df["field"], by = df[1:36], FUN = sum)
    

    or use the formula method (aggregate.formula) like this:

    nms <- c("field", names(df)[1:36])
    aggregate(field ~., df, sum)
    

    In terms of the example data at the end of the question:

    Lines <- " A B C D Sum
    1 A B C D   1
    2 A B C D   2
    3 A B C D   3
    4 E F 1 R   4
    5 E F 1 R   5"
    df <- read.table(text = Lines, header = TRUE)
    
    # data.frame method
    aggregate(df["Sum"], df[1:4], sum)
    
    # data.frame method - alternative
    aggregate(df[5], df[-5], sum)
    
    # formula method
    aggregate(Sum ~., df, sum)
    
    0 讨论(0)
  • 2020-12-20 08:32

    You are asking how to aggregate the sum of multiple variables, grouped by the remaining variables. I would do this by combining the multiple variables first and then aggregating using the (in my opinion) more convenient formula interface of the aggregate function. For instance, consider aggregating the sum of Sepal.Length, Sepal.Width, and Petal.Length in the iris dataset based on the remaining variables (Petal.Width and Species):

    agg <- iris
    cols <- c("Sepal.Length", "Sepal.Width", "Petal.Length")
    agg$sum <- rowSums(agg[,cols])
    agg <- agg[,!names(agg) %in% cols]
    aggregate(sum~., data=agg, FUN=sum)
    #    Petal.Width    Species   sum
    # 1          0.1     setosa  47.8
    # 2          0.2     setosa 284.1
    # 3          0.3     setosa  68.1
    # 4          0.4     setosa  74.6
    # 5          0.5     setosa  10.1
    # 6          0.6     setosa  10.1
    # 7          1.0 versicolor  79.9
    # 8          1.1 versicolor  34.3
    # 9          1.2 versicolor  63.8
    # 10         1.3 versicolor 166.5
    # 11         1.4 versicolor  96.7
    # 12         1.5 versicolor 136.5
    # 13         1.6 versicolor  42.0
    # 14         1.7 versicolor  14.7
    # 15         1.8 versicolor  13.9
    # 16         1.4  virginica  14.3
    # 17         1.5  virginica  27.4
    # 18         1.6  virginica  16.0
    # 19         1.7  virginica  11.9
    # 20         1.8  virginica 162.2
    # 21         1.9  virginica  71.7
    # 22         2.0  virginica  91.3
    # 23         2.1  virginica  94.4
    # 24         2.2  virginica  48.3
    # 25         2.3  virginica 125.6
    # 26         2.4  virginica  44.4
    # 27         2.5  virginica  48.2
    
    0 讨论(0)
提交回复
热议问题