Create sequential counter that restarts on a condition within panel data groups [duplicate]

前端未结

关注

 2  1663

轻奢々

相关标签:

2条回答

渐次进展

2020-12-03 23:44
```
library(splitstackshape)
df$counter <- getanID(cbind(df$country, cumsum(df$event)))[,.id]
```
We take advantage of the fact that you already have zeroes and ones in your event column. That makes indexing much easier. I combine the country column with cumsum(df$event). When that command is run by itself you can see its effect:
```
cumsum(df$event)
 [1] 0 0 1 1 1 2 2 2 3 3
```
It will only increase with each 1 value. When combined with the country, we are able to see the increase grouped by country.

From there, we can create an id column. @AnandaMahto's splitstackshape package has the function getanID for that.
```
 df
   country year event counter
1        A 2000     0       1
2        A 2001     0       2
3        A 2002     1       1
4        A 2003     0       2
5        A 2004     0       3
6        B 2000     1       1
7        B 2001     0       2
8        B 2002     0       3
9        B 2003     1       1
10       B 2004     0       2
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

感情败类

2020-12-03 23:57

With dplyr that would be:

df %>% 
  group_by(country, idx = cumsum(event == 1L)) %>% 
  mutate(counter = row_number()) %>% 
  ungroup %>% 
  select(-idx)

#Source: local data frame [10 x 4]
#
#   country year event counter
#1        A 2000     0       1
#2        A 2001     0       2
#3        A 2002     1       1
#4        A 2003     0       2
#5        A 2004     0       3
#6        B 2000     1       1
#7        B 2001     0       2
#8        B 2002     0       3
#9        B 2003     1       1
#10       B 2004     0       2

Or using data.table:

library(data.table)
setDT(df)[, counter := seq_len(.N), by = list(country, cumsum(event == 1L))]

Edit: group_by(country, idx = cumsum(event == 1L)) is used to group by country and a new grouping index "idx". The event == 1L part creates a logical index telling us whether the column "event" is an integer 1 or not (TRUE/FALSE). Then, cumsum(...) sums up starting from 0 for the first 2 rows, 1 for the next 3, 2 for the next 3 and so on. We use this new column (+ country) to group the data as needed. You can check it out if you remove the last to pipe-parts in the dplyr code.

0 讨论(0)

热议问题