I\'m trying to build a churn model that includes the maximum consecutive number of UX failures for each customer and having trouble. Here\'s my simplified data and desired o
Here is my try, only using standard dplyr
functions:
df %>%
# grouping key(s):
group_by(customerId) %>%
# check if there is any value change
# if yes, a new sequence id is generated through cumsum
mutate(last_one = lag(isFailure, 1, default = 100),
not_eq = last_one != isFailure,
seq = cumsum(not_eq)) %>%
# the following is just to find the largest sequence
count(customerId, isFailure, seq) %>%
group_by(customerId, isFailure) %>%
summarise(max_consecutive_event = max(n))
Output:
# A tibble: 5 x 3
# Groups: customerId [3]
customerId isFailure max_consecutive_event
1 1 0 1
2 2 0 1
3 2 1 1
4 3 0 1
5 3 1 2
A final filter on isFailure
value would yield the wanted result (need to add back 0
failure count customers though).
The script can take any values of isFailure
column and count the maximum consecutive days of having the same value.