Calculating grouped variance from a frequency table in R

前端 未结 2 634
半阙折子戏
半阙折子戏 2021-01-23 13:31

How can I, in R calculate the overall variance and the variance for each group from a dataset that looks like this (for example):

Group Count Value
A      3              


        
相关标签:
2条回答
  • 2021-01-23 13:38

    Here's a quick wrapper with base R. First step is to grow your data set by Count, and then calculate variance by group

    df1 <- df[rep(seq_len(nrow(df)), df$Count), ]
    with(df1, tapply(Value, Group, var))
    #   A   B 
    # 2.7 4.0 
    

    Or similarly

    aggregate(Value ~ Group, df1, function(x) c(Var = var(x), Count = length(x)))
    #   Group Value.Var Value.Count
    # 1     A       2.7         5.0
    # 2     B       4.0         4.0
    
    0 讨论(0)
  • 2021-01-23 13:46

    One option is using data.table. Convert the data.frame to data.table (setDT) and get the var of "Value" and sum of "Count" by "Group".

    library(data.table)
    setDT(df1)[, list(GroupVariance=var(rep(Value, Count)),
                          TotalCount=sum(Count)) , by = Group]
    #    Group GroupVariance TotalCount
    #1:     A           2.7          5
    #2:     B           4.0          4
    

    a similar way using dplyr is

    library(dplyr)
    group_by(df1, Group) %>% 
          summarise(GroupVariance=var(rep(Value,Count)), TotalCount=sum(Count))
    #     Group GroupVariance TotalCount
    #1     A           2.7          5
    #2     B           4.0          4
    
    0 讨论(0)
提交回复
热议问题