Cumulative sum that resets when 0 is encountered

后端 未结 4 1879
有刺的猬
有刺的猬 2020-11-27 19:55

I would like to do a cumulative sum on a field but reset the aggregated value whenever a 0 is encountered.

Here is an example of what I want :

data.f         


        
相关标签:
4条回答
  • 2020-11-27 20:09

    You could use the Reduce function with a custom function that returns 0 when the new value encountered is 0 and otherwise adds the new value to the accumulated value:

    Reduce(function(x, y) if (y == 0) 0 else x+y, c(1, 0, 1, 1), accumulate=TRUE)
    # [1] 1 0 1 2
    
    0 讨论(0)
  • 2020-11-27 20:31

    Another base would be just

    with(df, ave(b, cumsum(b == 0), FUN = cumsum))
    ## [1] 1 0 1 2
    

    This will just divide column b to groups according to 0 appearances and compute the cumulative sum of b per these groups


    Another solution using the latest data.table version (v 1.9.6+)

    library(data.table) ## v 1.9.6+
    setDT(df)[, whatiwant := cumsum(b), by = rleid(b == 0L)]
    #    campaign  date b whatiwant
    # 1:        a   jan 1         1
    # 2:        b   feb 0         0
    # 3:        c march 1         1
    # 4:        d april 1         2
    

    Some benchmarks per comments

    set.seed(123)
    x <- sample(0:1e3, 1e7, replace = TRUE)
    system.time(res1 <- ave(x, cumsum(x == 0), FUN = cumsum))
    # user  system elapsed 
    # 1.54    0.24    1.81 
    system.time(res2 <- Reduce(function(x, y) if (y == 0) 0 else x+y, x, accumulate=TRUE))
    # user  system elapsed 
    # 33.94    0.39   34.85 
    library(data.table)
    system.time(res3 <- data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)])
    # user  system elapsed 
    # 0.20    0.00    0.21 
    
    identical(res1, as.integer(res2))
    ## [1] TRUE
    identical(res1, res3$whatiwant)
    ## [1] TRUE
    
    0 讨论(0)
  • 2020-11-27 20:33

    hutilscpp::cumsum_reset is designed for this purpose. The first argument is a logical vector, indicating when the cumulative sum should continue. The second argument is the input to the cumulative sum itself

    library(hutilscpp)
    b <- c(1, 0, 1, 1)
    cumsum_reset(as.logical(b), b)
    

    On my machine, compared to the data.table function above, this use of cumsum_reset is about 3 times faster.

    0 讨论(0)
  • 2020-11-27 20:34

    Another late idea:

    ff = function(x)
    {
        cs = cumsum(x)
        cs - cummax((x == 0) * cs)
    }
    ff(c(0, 1, 3, 0, 0, 5, 2))
    #[1] 0 1 4 0 0 5 7
    

    And to compare:

    library(data.table)
    ffdt = function(x) 
        data.table(x)[, whatiwant := cumsum(x), by = rleid(x == 0L)]$whatiwant
    
    x = as.numeric(x) ##because 'cumsum' causes integer overflow
    identical(ff(x), ffdt(x))
    #[1] TRUE
    microbenchmark::microbenchmark(ff(x), ffdt(x), times = 25)
    #Unit: milliseconds
    #    expr      min       lq   median       uq      max neval
    #   ff(x) 315.8010 362.1089 372.1273 386.3892 405.5218    25
    # ffdt(x) 374.6315 407.2754 417.6675 447.8305 534.8153    25
    
    0 讨论(0)
提交回复
热议问题