Display weighted mean by group in the data.frame

后端未结

关注

 3  1390

Issues regarding the command by and weighted.mean already exist but none was able to help solving my problem. I am new to R and am more used to dat

相关标签:

3条回答

耶瑟儿～

2020-11-29 11:02

If we use mutate, then we can avoid the left_join

library(dplyr)
df %>%
   group_by(education) %>% 
   mutate(weighted_income = weighted.mean(income, weight))
#    obs income education weight weighted_income
#  <int>  <int>    <fctr>  <int>           <dbl>
#1     1   1000         A     10        1166.667
#2     2   2000         B      1        1583.333
#3     3   1500         B      5        1583.333
#4     4   2000         A      2        1166.667

0 讨论(0)

悲&欢浪女

2020-11-29 11:15

Try using the dplyr package as follows:

df <- read.table(text = 'obs income education weight   
                          1   1000      A       10     
                          2   2000      B        1     
                          3   1500      B        5     
                          4   2000      A        2', 
                 header = TRUE)     

library(dplyr)

df_summary <- 
  df %>% 
  group_by(education) %>% 
  summarise(weighted_income = weighted.mean(income, weight))

df_summary
# education weighted_income
#     A        1166.667
#     B        1583.333

df_final <- left_join(df, df_summary, by = 'education')

df_final
# obs income education weight weighted_income
#  1   1000         A     10        1166.667
#  2   2000         B      1        1583.333
#  3   1500         B      5        1583.333
#  4   2000         A      2        1166.667

0 讨论(0)

情话喂你

2020-11-29 11:17
There is a function weighted.mean in base R. Unfortunately, it does not work easily with ave. One solution is to use data.table
```
library(data.table)
setDT(data)
data[, incomeGroup := weighted.mean(income, weight), by=education]
data
   income education weight incomeGroup
1:   1000         A     10    1166.667
2:   2000         B      1    1583.333
3:   1500         B      5    1583.333
4:   2000         A      2    1166.667
```
A bizarre method that does work with ave is
```
ave(df[c("income", "weight")], df$education,
    FUN=function(x) weighted.mean(x$income, x$weight))[[1]]
[1] 1166.667 1583.333 1583.333 1166.667
```
You feed the subset data.frame to the function and then group by your grouping variable. The FUN argument creates a function that takes a data.frame and applies weighted.mean to the result. As the final output is a data.frame, the [[1]] returns a vector with the desired result.

Note that this is just a proof that this is possible -- I wouldn't recommend this method, the data.table technique is much cleaner and will be much faster on data sets larger than 1000 observations.
0 讨论(0)
发布评论:

提交评论
- 加载中...