R, dplyr: cumulative version of n_distinct

后端 未结 4 1474

I have a dataframe as follows. It is ordered by column time.

Input -

df = data.frame(time = 1:20,
            grp = sort(rep(1:5,4)),
             


        
4条回答
  •  醉话见心
    2021-02-09 03:51

    A dplyr solution inspired from @akrun's answer -

    Ths logic is basically to set 1st occurrence of each unique values of var1 to 1 and rest to 0 for each group grp and then apply cumsum on it -

    df = df %>%
      arrange(time) %>%
      group_by(grp,var1) %>%
      mutate(var_temp = ifelse(row_number()==1,1,0)) %>%
      group_by(grp) %>%
      mutate(var2 = cumsum(var_temp)) %>%
      select(-var_temp)
    
    head(df,10)
    
    Source: local data frame [10 x 4]
    Groups: grp
    
       time grp var1 var2
    1     1   1    A    1
    2     2   1    B    2
    3     3   1    A    2
    4     4   1    B    2
    5     5   2    A    1
    6     6   2    B    2
    7     7   2    A    2
    8     8   2    B    2
    9     9   3    A    1
    10   10   3    B    2
    

提交回复
热议问题