How do I sum the values of columns in several tables if tables have different lengths?

后端 未结 3 360
离开以前
离开以前 2021-01-17 23:35

Alright, this should be an easy one but I\'m looking for a solution that\'s as fast as possible.

Let\'s say I have 3 tables (the number of tables will be much larger

相关标签:
3条回答
  • 2021-01-18 00:17

    you can try this

    df <- rbind(as.matrix(tab1), as.matrix(tab2), as.matrix(tab3))
    aggregate(df, by=list(row.names(df)), FUN=sum)
      Group.1 V1
    1       1  7
    2       2  3
    3       3  4
    4       4  3
    5       5  1
    
    0 讨论(0)
  • 2021-01-18 00:33

    We concatenate (c) the tab output to create 'v1', use tapply to get the sum of the elements grouped by the names of that object.

    v1 <- c(tab1, tab2, tab3)
    tapply(v1, names(v1), FUN=sum)
    #1 2 3 4 5 
    #7 3 4 3 1 
    
    0 讨论(0)
  • 2021-01-18 00:34

    You could use rowsum(). The output will be slightly different than what you show, but you can always restructure it after the calculations. rowsum() is known to be very efficient.

    x <- c(tab1, tab2, tab3)
    rowsum(x, names(x))
    #   [,1]
    # 1    7
    # 2    3
    # 3    4
    # 4    3
    # 5    1
    

    Here's a benchmark with akrun's data.table suggestion added in as well.

    library(microbenchmark)
    library(data.table)
    
    xx <- rep(x, 1e5)
    
    microbenchmark(
        tapply = tapply(xx, names(xx), FUN=sum),
        rowsum = rowsum(xx, names(xx)),
        data.table = data.table(xx, names(xx))[, sum(xx), by = V2]
    )
    # Unit: milliseconds
    #        expr       min        lq      mean    median        uq       max neval
    #      tapply 150.47532 154.80200 176.22410 159.02577 204.22043 233.34346   100
    #      rowsum  41.28635  41.65162  51.85777  43.33885  45.43370 109.91777   100
    #  data.table  21.39438  24.73580  35.53500  27.56778  31.93182  92.74386   100
    
    0 讨论(0)
提交回复
热议问题