Display weighted mean by group in the data.frame

后端 未结 3 1390
一整个雨季
一整个雨季 2020-11-29 10:25

Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to dat

相关标签:
3条回答
  • 2020-11-29 11:02

    If we use mutate, then we can avoid the left_join

    library(dplyr)
    df %>%
       group_by(education) %>% 
       mutate(weighted_income = weighted.mean(income, weight))
    #    obs income education weight weighted_income
    #  <int>  <int>    <fctr>  <int>           <dbl>
    #1     1   1000         A     10        1166.667
    #2     2   2000         B      1        1583.333
    #3     3   1500         B      5        1583.333
    #4     4   2000         A      2        1166.667
    
    0 讨论(0)
  • 2020-11-29 11:15

    Try using the dplyr package as follows:

    df <- read.table(text = 'obs income education weight   
                              1   1000      A       10     
                              2   2000      B        1     
                              3   1500      B        5     
                              4   2000      A        2', 
                     header = TRUE)     
    
    library(dplyr)
    
    df_summary <- 
      df %>% 
      group_by(education) %>% 
      summarise(weighted_income = weighted.mean(income, weight))
    
    df_summary
    # education weighted_income
    #     A        1166.667
    #     B        1583.333
    
    df_final <- left_join(df, df_summary, by = 'education')
    
    df_final
    # obs income education weight weighted_income
    #  1   1000         A     10        1166.667
    #  2   2000         B      1        1583.333
    #  3   1500         B      5        1583.333
    #  4   2000         A      2        1166.667
    
    0 讨论(0)
  • 2020-11-29 11:17

    There is a function weighted.mean in base R. Unfortunately, it does not work easily with ave. One solution is to use data.table

    library(data.table)
    setDT(data)
    data[, incomeGroup := weighted.mean(income, weight), by=education]
    data
       income education weight incomeGroup
    1:   1000         A     10    1166.667
    2:   2000         B      1    1583.333
    3:   1500         B      5    1583.333
    4:   2000         A      2    1166.667
    

    A bizarre method that does work with ave is

    ave(df[c("income", "weight")], df$education,
        FUN=function(x) weighted.mean(x$income, x$weight))[[1]]
    [1] 1166.667 1583.333 1583.333 1166.667
    

    You feed the subset data.frame to the function and then group by your grouping variable. The FUN argument creates a function that takes a data.frame and applies weighted.mean to the result. As the final output is a data.frame, the [[1]] returns a vector with the desired result.

    Note that this is just a proof that this is possible -- I wouldn't recommend this method, the data.table technique is much cleaner and will be much faster on data sets larger than 1000 observations.

    0 讨论(0)
提交回复
热议问题