Efficient conditional summing by multiple conditions in R

后端 未结 2 1921
隐瞒了意图╮
隐瞒了意图╮ 2021-01-28 08:45

I\'m struggling with finding an efficient solution for the following problem:

I have a large manipulated data frame with around 8 columns and 80000 rows that generally i

2条回答
  •  -上瘾入骨i
    2021-01-28 09:26

    This is a good task for the split-apply-combine paradigm. First, you split your data frame by company/year pair:

    data = data.frame(company.raw = c("C1", "C1", "C2", "C2", "C2", "C2"),
                      years.raw = c(1, 1, 1, 1, 2, 2),
                      source = c("Ink", "Recycling", "Coffee", "Combusted", "Printer", "Tea"),
                      amount.inkg = c(5, 2, 10, 15, 14, 18))
    spl = split(data, paste(data$company.raw, data$years.raw))
    

    Now, you compute the rolled-up data frame for each element in the split-up data:

    spl2 = lapply(spl, function(x) {
      data.frame(Company=x$company.raw[1],
                 Year=x$years.raw[1],
                 amount.vector1 = sum(x$amount.inkg[x$source %in% vector1]),
                 amount.vector2 = sum(x$amount.inkg[x$source %in% vector2]),
                 amount.vector3 = sum(x$amount.inkg[x$source %in% vector3]))
    })
    

    And finally, combine everything together:

    do.call(rbind, spl2)
    #      Company Year amount.vector1 amount.vector2 amount.vector3
    # C1 1      C1    1              0              5              2
    # C2 1      C2    1             10              0             15
    # C2 2      C2    2             18             14              0
    

提交回复
热议问题