Cumulative sum with lag

点点圈 提交于 2019-11-29 06:54:28

You can use 0 for the first element, and remove the last element using head(, -1)

transform(df, previous_comments=ave(comment_count, member_id, 
          FUN = function(x) cumsum(c(0, head(x, -1)))))
#  member_id entry_id comment_count           timestamp previous_comments
#1         1        a             4 2008-06-09 12:41:00                 0
#2         1        b             1 2008-07-14 18:41:00                 4
#3         1        c             3 2008-07-17 15:40:00                 5
#4         2        d            12 2008-06-09 12:41:00                 0
#5         2        e            50 2008-09-18 10:22:00                12
#6         3        f             0 2008-10-03 13:36:00                 0

You could use lag from dplyr and change the k

library(dplyr)
df %>% 
    group_by(member_id) %>%
    mutate(previous_comments=lag(cumsum(comment_count),k=1, default=0))
 #    member_id entry_id comment_count           timestamp previous_comments
 #1         1        a             4 2008-06-09 12:41:00                 0
 #2         1        b             1 2008-07-14 18:41:00                 4
 #3         1        c             3 2008-07-17 15:40:00                 5
 #4         2        d            12 2008-06-09 12:41:00                 0
 #5         2        e            50 2008-09-18 10:22:00                12
 #6         3        f             0 2008-10-03 13:36:00                 0

Or using data.table

 library(data.table)
  setDT(df)[,previous_comments:=c(0,cumsum(comment_count[-.N])) , member_id]

Just subtract comment_count from ave :

transform(df, 
  aggregated_count = ave(comment_count, member_id, FUN = cumsum) - comment_count)
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!