Cumulative sum until maximum reached, then repeat from zero in the next row

前端 未结 3 715
粉色の甜心
粉色の甜心 2020-11-28 14:07

I feel like this is a fairly easy question, but for the life of me I can\'t seem to find the answer. I have a fairly standard dataframe, and what I am trying to do is sum th

相关标签:
3条回答
  • 2020-11-28 14:17

    Assuming your data.frame is df:

    df$difference_sum <- c(0, head(cumsum(df$difference), -1))
    # get length of 0's (first keep value gives the actual length)
    len <- sum(df$difference_sum %/% 1470 == 0)
    df$keep <- (seq_len(nrow(df))-1) %/% len
    df <- transform(df, difference_sum = ave(difference, keep, 
              FUN=function(x) c(0, head(cumsum(x), -1))))
    
    #       minutes difference keep difference_sum
    # 1  1052991158        180    0              0
    # 2  1052991338        180    0            180
    # 3  1052991518        180    0            360
    # 4  1052991698        180    0            540
    # 5  1052991878        180    0            720
    # 6  1052992058        180    0            900
    # 7  1052992238        180    0           1080
    # 8  1052992418        180    0           1260
    # 9  1052992598        180    0           1440
    # 10 1052992778        180    1              0
    # 11 1052992958        180    1            180
    
    0 讨论(0)
  • 2020-11-28 14:26

    I think this is best done with a for loop, can't think of a function that could do so out of the box. The following should do what you want (if I understand you correctly).

    current.sum <- 0
    for (c in 1:nrow(caribou.sub)) {
        current.sum <- current.sum + caribou.sub[c, "difference"]
        carribou.sub[c, "difference_sum"] <- current.sum
        if (current.sum >= 1470) {
            caribou.sub[c, "keep"] <- 1
            current.sum <- 0
        }
    }
    

    Feel free to comment if it does not exactly what you want. But as pointed out by alexwhan, your description is not completely clear.

    0 讨论(0)
  • 2020-11-28 14:33

    I still don't understand about when the sum should restart and if it should be zero then. A desired result would help greatly.

    Nonetheless, I can't help but think that simply indexing and subtraction would be a straightforward way of doing this. The below code gives the same result as @Henrik's solution.

    df$difference_sum <- cumsum(df$difference)
    step <- (df$difference_sum %/% 1470) + 1
    k <- which(diff(step) > 0) + 1
    df$keep <- 0
    df$keep[k] <- 1
    step[k] <- step[k] - 1
    df$difference_sum <- df$difference_sum - c(0, df$difference_sum[k])[step]
    
    0 讨论(0)
提交回复
热议问题