r cumsum per group in dplyr

后端 未结 2 1525
终归单人心
终归单人心 2020-12-09 15:43

I am starting to enjoy dplyr but I got stuck on a use case. I want to be able to apply cumsum per group in a dataframe with the package but I can\'

相关标签:
2条回答
  • 2020-12-09 16:20

    Ah. After fiddling around I seem to have found it.

    pdf = df %>% group_by(group) %>% arrange(dates) %>% mutate(cs = cumsum(sales))
    

    Output with forloop in question:

    > pdf = data.frame(dates=as.Date(as.character()), group=as.character(), sales=as.numeric())
    > for(grp in unique(df$group)){
    +   subs = filter(df, group == grp) %>% arrange(dates)
    +   pdf = rbind(pdf, data.frame(dates=subs$dates, group=grp, sales=subs$sales, cs=cumsum(subs$sales)))
    + }
    > pdf
            dates group       sales         cs
    1  2014-01-02     A -0.56047565 -0.5604756
    2  2014-01-03     A -0.23017749 -0.7906531
    3  2014-01-04     A  1.55870831  0.7680552
    4  2014-01-05     A  0.07050839  0.8385636
    5  2014-01-06     A  0.12928774  0.9678513
    6  2014-01-02     B  1.71506499  1.7150650
    7  2014-01-03     B  0.46091621  2.1759812
    8  2014-01-04     B -1.26506123  0.9109200
    9  2014-01-05     B -0.68685285  0.2240671
    10 2014-01-06     B -0.44566197 -0.2215949
    11 2014-01-02     C  1.22408180  1.2240818
    12 2014-01-03     C  0.35981383  1.5838956
    13 2014-01-04     C  0.40077145  1.9846671
    14 2014-01-05     C  0.11068272  2.0953498
    15 2014-01-06     C -0.55584113  1.5395087
    

    Output with this line of code:

    > pdf = df %>% group_by(group) %>% mutate(cs = cumsum(sales))
    > pdf
    Source: local data frame [15 x 4]
    Groups: group
    
            dates group       sales         cs
    1  2014-01-02     A -0.56047565 -0.5604756
    2  2014-01-03     A -0.23017749 -0.7906531
    3  2014-01-04     A  1.55870831  0.7680552
    4  2014-01-05     A  0.07050839  0.8385636
    5  2014-01-06     A  0.12928774  0.9678513
    6  2014-01-02     B  1.71506499  1.7150650
    7  2014-01-03     B  0.46091621  2.1759812
    8  2014-01-04     B -1.26506123  0.9109200
    9  2014-01-05     B -0.68685285  0.2240671
    10 2014-01-06     B -0.44566197 -0.2215949
    11 2014-01-02     C  1.22408180  1.2240818
    12 2014-01-03     C  0.35981383  1.5838956
    13 2014-01-04     C  0.40077145  1.9846671
    14 2014-01-05     C  0.11068272  2.0953498
    15 2014-01-06     C -0.55584113  1.5395087
    
    0 讨论(0)
  • 2020-12-09 16:43

    try using

     group_by(group) %>%
     arrange(group) %>% 
     summarise(cs = sum(sales)) %>%
     mutate(sales_grp = cumsum(cs))
    

    I know this Q/A is a bit dated, but this may help anyone stuck where I was stuck after reading the dplyr posted solution to cumsum() at https://dplyr.tidyverse.org/articles/window-functions.html. and http://www.sthda.com/english/articles/17-tips-tricks/57-dplyr-how-to-add-cumulative-sums-by-groups-into-a-data-framee/

    The solutions posted at the links above don't sum by group. The code only adds the next row in sequence -- which is not a grouped cumulative sum. I would think the OP was looking for the sum of sales for Group A, Group B, and Group C with each group total added to the next -- your total n() in the OPs case should be 3 not 15 with a grouped cumsum(). This would be helpful if you were trying to calculate what percent of your annual sales occurred by the end of October, for example, such that you want the sum of sales by month then the total in sequence over the course of the year.

    So, you need to first summarize the value by each group then sum them up in order of the groups. PS -- this is my first attempt at trying to answer a question on SO and posting a solution that worked for me; so I hope you will please be kind if my answer is wrong. I always try to learn from my mistakes.

    0 讨论(0)
提交回复
热议问题