How can I apply different aggregate functions to different columns in R?

匆匆过客 提交于 2020-01-03 02:27:31

问题


How can I apply different aggregate functions to different columns in R? The aggregate() function only offers one function argument to be passed:

V1  V2        V3
1   18.45022  62.24411694
2   90.34637  20.86505214
1   50.77358  27.30074987
2   52.95872  30.26189013
1   61.36935  26.90993530
2   49.31730  70.60387016
1   43.64142  87.64433517
2   36.19730  83.47232907
1   91.51753  0.03056485
... ...       ...

> aggregate(sample,by=sample["V1"],FUN=sum)
  V1 V1       V2       V3
1  1 10 578.5299 489.5307
2  2 20 575.2294 527.2222

How can I apply a different function to each column, i.e. aggregate V2 with the mean() function and V2 with the sum() function, without calling aggregate() multiple times?


回答1:


For that task, I will use ddply in plyr

> library(plyr)
> ddply(sample, .(V1), summarize, V2 = sum(V2), V3 = mean(V3))
  V1       V2       V3
1  1 578.5299 48.95307
2  2 575.2294 52.72222



回答2:


...Or the function data.table in the package of the same name:

library(data.table)

myDT <- data.table(sample) # As mdsumner suggested, this is not a great name

myDT[, list(sumV2 = sum(V2), meanV3 = mean(V3)), by = V1]

#      V1    sumV2   meanV3
# [1,]  1 578.5299 48.95307
# [2,]  2 575.2294 52.72222



回答3:


Let's call the dataframe x rather than sample which is already taken.

EDIT:

The by function provides a more direct route than split/apply/combine

by(x, list(x$V1), f)

:EDIT

lapply(split(x, x$V1), myfunkyfunctionthatdoesadifferentthingforeachcolumn)

Of course, that's not a separate function for each column but one can do both jobs.

myfunkyfunctionthatdoesadifferentthingforeachcolumn = function(x) c(sum(x$V2), mean(x$V3))

Convenient ways to collate the result are possible such as this (but check out plyr package for a comprehensive solution, consider this motivation to learn something better).

 matrix(unlist(lapply(split(x, x$V1), myfunkyfunctionthatdoesadifferentthingforeachcolumn)), ncol = 2, byrow = TRUE, dimnames = list(unique(x$V1), c("sum", "mean")))


来源:https://stackoverflow.com/questions/10702708/how-can-i-apply-different-aggregate-functions-to-different-columns-in-r

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!