Calculating grouped variance from a frequency table in R

我与影子孤独终老i 提交于 2019-12-02 05:32:58

One option is using data.table. Convert the data.frame to data.table (setDT) and get the var of "Value" and sum of "Count" by "Group".

library(data.table)
setDT(df1)[, list(GroupVariance=var(rep(Value, Count)),
                      TotalCount=sum(Count)) , by = Group]
#    Group GroupVariance TotalCount
#1:     A           2.7          5
#2:     B           4.0          4

a similar way using dplyr is

library(dplyr)
group_by(df1, Group) %>% 
      summarise(GroupVariance=var(rep(Value,Count)), TotalCount=sum(Count))
#     Group GroupVariance TotalCount
#1     A           2.7          5
#2     B           4.0          4

Here's a quick wrapper with base R. First step is to grow your data set by Count, and then calculate variance by group

df1 <- df[rep(seq_len(nrow(df)), df$Count), ]
with(df1, tapply(Value, Group, var))
#   A   B 
# 2.7 4.0 

Or similarly

aggregate(Value ~ Group, df1, function(x) c(Var = var(x), Count = length(x)))
#   Group Value.Var Value.Count
# 1     A       2.7         5.0
# 2     B       4.0         4.0
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!