I\'d like to generate cumulative sums with a reset if the \"current\" sum exceeds some threshold, using dplyr. In the below, I want to cumsum over \'a\'.
lib
if you're interested in the group building based on cumsum < threshold
You can use the following base::
function:
cumSumReset <- function(x, thresh = 4) {
ans <- numeric()
i <- 0
while(length(x) > 0) {
cs_over <- cumsum(x)
ntimes <- sum( cs_over <= thresh )
x <- x[-(1:ntimes)]
ans <- c(ans, rep(i, ntimes))
i <- i + 1
}
return(ans)
}
call:
tib %>% mutate(g = cumSumReset(a, 5))
result:
# A tibble: 6 x 3
# t a g
# <dbl> <dbl> <dbl>
#1 1 2 0
#2 2 3 0
#3 3 1 1
#4 4 2 1
#5 5 2 1
#6 6 3 2
g
you can now do whatever you like.I think you can use accumulate()
here to help. And i've also made a wrapper function to use for different thresholds
sum_reset_at <- function(thresh) {
function(x) {
accumulate(x, ~if_else(.x>=thresh, .y, .x+.y))
}
}
tib %>% mutate(c = sum_reset_at(5)(a))
# t a c
# <dbl> <dbl> <dbl>
# 1 1 2 2
# 2 2 3 5
# 3 3 1 1
# 4 4 2 3
# 5 5 2 5
# 6 6 3 3
tib %>% mutate(c = sum_reset_at(4)(a))
# t a c
# <dbl> <dbl> <dbl>
# 1 1 2 2
# 2 2 3 5
# 3 3 1 1
# 4 4 2 3
# 5 5 2 5
# 6 6 3 3
tib %>% mutate(c = sum_reset_at(6)(a))
# t a c
# <dbl> <dbl> <dbl>
# 1 1 2 2
# 2 2 3 5
# 3 3 1 6
# 4 4 2 2
# 5 5 2 4
# 6 6 3 7