I\'m struggling with finding an efficient solution for the following problem:
I have a large manipulated data frame with around 8 columns and 80000 rows that generally i
This is a good task for the split-apply-combine paradigm. First, you split your data frame by company/year pair:
data = data.frame(company.raw = c("C1", "C1", "C2", "C2", "C2", "C2"),
years.raw = c(1, 1, 1, 1, 2, 2),
source = c("Ink", "Recycling", "Coffee", "Combusted", "Printer", "Tea"),
amount.inkg = c(5, 2, 10, 15, 14, 18))
spl = split(data, paste(data$company.raw, data$years.raw))
Now, you compute the rolled-up data frame for each element in the split-up data:
spl2 = lapply(spl, function(x) {
data.frame(Company=x$company.raw[1],
Year=x$years.raw[1],
amount.vector1 = sum(x$amount.inkg[x$source %in% vector1]),
amount.vector2 = sum(x$amount.inkg[x$source %in% vector2]),
amount.vector3 = sum(x$amount.inkg[x$source %in% vector3]))
})
And finally, combine everything together:
do.call(rbind, spl2)
# Company Year amount.vector1 amount.vector2 amount.vector3
# C1 1 C1 1 0 5 2
# C2 1 C2 1 10 0 15
# C2 2 C2 2 18 14 0