Find top deciles from dataframe by group

只谈情不闲聊 提交于 2019-12-02 05:32:41

The idiomatic way to do this kind of thing in R would be to use a combination of split and lapply. You're halfway there with your use of lapply; you just need to use split as well.

lapply(split(data, data$v1), function(df) {
    cutoff <- quantile(df$v2, c(0.8, 0.9))
    top_pct <- ifelse(df$v2 > cutoff[2], 10, ifelse(df$v2 > cutoff[1], 20, NA))
    na.omit(data.frame(id=df$custID, top_pct))
})

Finding quantiles is done with quantile.

Stick to your Stata instincts and use a single data set:

require(data.table)
DT <- data.table(data)

DT[,r:=rank(v2)/.N,by=v1]

You can see the result by typing DT.


From here, you can group the within-v1 rank, r, if you want to. Following Stata idioms...

DT[,g:={
  x = rep(0,.N)
  x[r>.8] = 20
  x[r>.9] = 10
  x
}]

This is like gen and then two replace ... if statements. Again, you can see the result with DT.


Finally, you can subset with

DT[g>0]

which gives

   custID v1 v2     r  g
1:      1  A 30 1.000 10
2:      2  A 29 0.900 20
3:      1  B 20 0.975 10
4:      2  B 19 0.875 20
5:      6  B 20 0.975 10
6:      7  B 19 0.875 20

These steps can also be chained together:

DT[,r:=rank(v2)/.N,by=v1][,g:={x = rep(0,.N);x[r>.8] = 20;x[r>.9] = 10;x}][g>0]

(Thanks to @ExperimenteR:)

To rearrange for the desired output in the OP, with values of v1 in columns, use dcast:

dcast(
  DT[,r:=rank(v2)/.N,by=v1][,g:={x = rep(0,.N);x[r>.8] = 20;x[r>.9] = 10;x}][g>0], 
  custID~v1)

Currently, dcast requires the latest version of data.table, available (I think) from Github.

You don't need the function pf to achieve what you want. Try dplyr/tidyr combo

library(dplyr)
library(tidyr)
data %>% 
    group_by(v1) %>% 
    arrange(desc(v2))%>%
    mutate(n=n()) %>% 
    filter(row_number() <= round(n * .2)) %>% 
    mutate(top_pct= ifelse(row_number()<=round(n* .1), 10, 20)) %>%
    select(custID, top_pct) %>% 
    spread(v1,  top_pct)
#  custID  A  B
#1      1 10 10
#2      2 20 20
#3      6 NA 10
#4      7 NA 20
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!