Summarize consecutive failures with dplyr and rle

后端 未结 2 1180
[愿得一人]
[愿得一人] 2021-01-20 03:56

I\'m trying to build a churn model that includes the maximum consecutive number of UX failures for each customer and having trouble. Here\'s my simplified data and desired o

2条回答
  •  攒了一身酷
    2021-01-20 04:18

    Here is my try, only using standard dplyr functions:

    df %>% 
      # grouping key(s):
      group_by(customerId) %>%
      # check if there is any value change
      # if yes, a new sequence id is generated through cumsum
      mutate(last_one = lag(isFailure, 1, default = 100), 
             not_eq = last_one != isFailure, 
             seq = cumsum(not_eq)) %>% 
      # the following is just to find the largest sequence
      count(customerId, isFailure, seq) %>% 
      group_by(customerId, isFailure) %>% 
      summarise(max_consecutive_event = max(n))
    

    Output:

    # A tibble: 5 x 3
    # Groups:   customerId [3]
      customerId isFailure max_consecutive_event
                                 
    1          1         0                     1
    2          2         0                     1
    3          2         1                     1
    4          3         0                     1
    5          3         1                     2
    

    A final filter on isFailure value would yield the wanted result (need to add back 0 failure count customers though).

    The script can take any values of isFailure column and count the maximum consecutive days of having the same value.

提交回复
热议问题