I\'m trying to build a churn model that includes the maximum consecutive number of UX failures for each customer and having trouble. Here\'s my simplified data and desired o
We group by the 'customerId' and use do
to perform the rle
on 'isFailure' column. Extract the lengths
that are 'TRUE' for values
(lengths[values]
), and create the 'Max' column with an if/else
condition to return 0 for those that didn't have any 1 value.
df %>%
group_by(customerId) %>%
do({tmp <- with(rle(.$isFailure==1), lengths[values])
data.frame(customerId= .$customerId, Max=if(length(tmp)==0) 0
else max(tmp)) }) %>%
slice(1L)
# customerId Max
#1 1 0
#2 2 1
#3 3 2
Here is my try, only using standard dplyr
functions:
df %>%
# grouping key(s):
group_by(customerId) %>%
# check if there is any value change
# if yes, a new sequence id is generated through cumsum
mutate(last_one = lag(isFailure, 1, default = 100),
not_eq = last_one != isFailure,
seq = cumsum(not_eq)) %>%
# the following is just to find the largest sequence
count(customerId, isFailure, seq) %>%
group_by(customerId, isFailure) %>%
summarise(max_consecutive_event = max(n))
Output:
# A tibble: 5 x 3
# Groups: customerId [3]
customerId isFailure max_consecutive_event
<dbl> <dbl> <int>
1 1 0 1
2 2 0 1
3 2 1 1
4 3 0 1
5 3 1 2
A final filter on isFailure
value would yield the wanted result (need to add back 0
failure count customers though).
The script can take any values of isFailure
column and count the maximum consecutive days of having the same value.