Find top deciles from dataframe by group

前端 未结 3 2010
花落未央
花落未央 2021-01-23 07:59

I am attempting to create new variables using a function and lapply rather than working right in the data with loops. I used to use Stata and would have solved this

3条回答
  •  北恋
    北恋 (楼主)
    2021-01-23 08:37

    You don't need the function pf to achieve what you want. Try dplyr/tidyr combo

    library(dplyr)
    library(tidyr)
    data %>% 
        group_by(v1) %>% 
        arrange(desc(v2))%>%
        mutate(n=n()) %>% 
        filter(row_number() <= round(n * .2)) %>% 
        mutate(top_pct= ifelse(row_number()<=round(n* .1), 10, 20)) %>%
        select(custID, top_pct) %>% 
        spread(v1,  top_pct)
    #  custID  A  B
    #1      1 10 10
    #2      2 20 20
    #3      6 NA 10
    #4      7 NA 20
    

提交回复
热议问题