cumsum in grouped data with dplyr

前端 未结 2 1947
北海茫月
北海茫月 2021-01-21 22:32

I have a data frame df (which can be downloaded here) referred to a register of companies that looks something like this:

    Provider.ID        Lo         


        
相关标签:
2条回答
  • 2021-01-21 22:39

    When you group by local.Authority & year it takes unique values and print the result as 1,-1,1 so better group by only local.Authority where cumsum works based on total values and result 1,0,1

     df <- df %>%
          group_by(Local.Authority) %>%
          mutate(cum.to = cumsum(total))
    
        > df
        Source: local data frame [3 x 8]
        Groups: Local.Authority [1]
    
          Provider.ID Local.Authority month  year entry  exit total cum.to
                <chr>           <chr> <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
        1 1-102642676            Kent    10  2010     1     0     1      1
        2 1-102642676            Kent     9  2011     0     1    -1      0
        3 1-102642676            Kent    10  2014     1     0     1      1
    
    0 讨论(0)
  • 2021-01-21 23:04

    I got the solution to my problem. I restarted my session and I got my result grouping just by Local Authority and then arranging:

    > df.1 = df %>% group_by(Local.Authority) %>%
    + mutate(cum.total = cumsum(total)) %>%
    + arrange(year, month, Local.Authority)
    > df.1
    Source: local data frame [41 x 8]
    Groups: Local.Authority [36]
    
       Provider.ID  Local.Authority month  year entry  exit total cum.total
            <fctr>           <fctr> <int> <int> <int> <int> <int>     <int>
    1  1-102642676           Bexley    10  2010     1     0     1         1
    2  1-102642676            Brent    10  2010     1     0     1         1
    3  1-102642676 Bristol, City of    10  2010     1     0     1         1
    4  1-102642676             Bury    10  2010     1     0     1         1
    5  1-102642676   Cambridgeshire    10  2010     1     0     1         1
    6  1-102642676    Cheshire East    10  2010     2     0     2         2
    7  1-102642676      East Sussex    10  2010     5     0     5         5
    8  1-102642676          Enfield    10  2010     1     0     1         1
    9  1-102642676            Essex    10  2010     1     0     1         1
    10 1-102642676        Hampshire    10  2010     1     0     1         1
    

    Checking "Kent" now it yields the expected result:

    > check = df.1 %>% filter(Local.Authority == "Kent")
    > check
    Source: local data frame [3 x 8]
    Groups: Local.Authority [1]
    
      Provider.ID Local.Authority month  year entry  exit total cum.total
           <fctr>          <fctr> <int> <int> <int> <int> <int>     <int>
    1 1-102642676            Kent    10  2010     1     0     1         1
    2 1-102642676            Kent     9  2011     0     1    -1         0
    3 1-102642676            Kent    10  2014     1     0     1         1
    
    0 讨论(0)
提交回复
热议问题