Calculating subtotals in R

后端 未结 7 1702
北荒
北荒 2021-02-15 11:28

I have a data frame with 900,000 rows and 11 columns in R. The column names and types are as follows:

column name: date / mcode / mname / ycode / yname / yissue          


        
7条回答
  •  不思量自难忘°
    2021-02-15 11:57

    if your data is large and speed matters, i would recommend using the R function rowsum, which is a lot faster. i applied the 3 methods (f1 = aggregate, f2 = ddply, f3 = tapply) suggested in the answers to compare it with f4 = rowsum and here is what i find:

       test replications elapsed relative
    4 f4()          100   0.033     1.00
    3 f3()          100   0.046     1.39
    1 f1()          100   0.165     5.00
    2 f2()          100   0.605    18.33
    

    i have added my code below if someone wants to explore in more detail.

    library(plyr);
    library(rbenchmark);
    
    val  = rnorm(50);
    name = rep(letters[1:5], each = 10);
    data = data.frame(val, name);
    
    f1 = function(){aggregate(data$val, by=list(data$name), FUN=sum)}
    f2 = function(){ddply(data, .(name), summarise, sum = sum(val))}
    f3 = function(){tapply(data$val, data$name, sum)}
    f4 = function(){rowsum(x = data$val, group = data$name)}
    
    benchmark(f1(), f2(), f3(), f4(),
              columns=c("test", "replications", "elapsed", "relative"),
              order="relative", replications=100)
    

提交回复
热议问题