How to get summary statistics by group

馋奶兔 提交于 2019-11-26 00:18:35

问题


I\'m trying to get multiple summary statistics in R/S-PLUS grouped by categorical column in one shot. I found couple of functions, but all of them do one statistic per call, like `aggregate().

data <- c(62, 60, 63, 59, 63, 67, 71, 64, 65, 66, 68, 66, 
          71, 67, 68, 68, 56, 62, 60, 61, 63, 64, 63, 59)
grp <- factor(rep(LETTERS[1:4], c(4,6,6,8)))
df <- data.frame(group=grp, dt=data)
mg <- aggregate(df$dt, by=df$group, FUN=mean)    
mg <- aggregate(df$dt, by=df$group, FUN=sum)    

What I\'m looking for is to get multiple statistics for the same group like mean, min, max, std, ...etc in one call, is that doable?


回答1:


I'll put in my two cents for tapply().

tapply(df$dt, df$group, summary)

You could write a custom function with the specific statistics you want to replace summary.




回答2:


dplyr package could be nice alternative to this problem:

library(dplyr)

df %>% 
  group_by(group) %>% 
  summarize(mean = mean(dt),
            sum = sum(dt))

To get 1st quadrant and 3rd quadrant

df %>% 
  group_by(group) %>% 
  summarize(q1 = quantile(dt, 0.25),
            q3 = quantile(dt, 0.75))



回答3:


Using Hadley Wickham's purrr package this is quite simple. Use split to split the passed data_frame into groups, then use map to apply the summary function to each group.

library(purrr)

df %>% split(.$group) %>% map(summary)



回答4:


There's many different ways to go about this, but I'm partial to describeBy in the psych package:

describeBy(df$dt, df$group, mat = TRUE) 



回答5:


take a look at the plyr package. Specifically, ddply

ddply(df, .(group), summarise, mean=mean(dt), sum=sum(dt))



回答6:


after 5 long years I'm sure not much attention is going to be received for this answer, But still to make all options complete, here is the one with data.table

library(data.table)
setDT(df)[ , list(mean_gr = mean(dt), sum_gr = sum(dt)) , by = .(group)]
#   group mean_gr sum_gr
#1:     A      61    244
#2:     B      66    396
#3:     C      68    408
#4:     D      61    488 



回答7:


Besides describeBy, the doBy package is an another option. It provides much of the functionality of SAS PROC SUMMARY. Details: http://www.statmethods.net/stats/descriptives.html




回答8:


While some of the other approaches work, this is pretty close to what you were doing and only uses base r. If you know the aggregate command this may be more intuitive.

with( df , aggregate( dt , by=list(group) , FUN=summary)  )



回答9:


First, it depends on your version of R. If you've passed 2.11, you can use aggreggate with multiple results functions(summary, by instance, or your own function). If not, you can use the answer made by Justin.



来源:https://stackoverflow.com/questions/9847054/how-to-get-summary-statistics-by-group

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!