I would like to increment a count that restarts from 1 when a condition in an existing column is met.
For example I have the following data frame:
df
Using base R:
df$x3 <- with(df, ave(x1, cumsum(x2 == 'start'), FUN = seq_along))
gives:
> df
x1 x2 x3
1 10 start 1
2 100 a 2
3 200 b 3
4 300 c 4
5 87 start 1
6 90 k 2
7 45 l 3
8 80 o 4
Or with the dplyr
or data.table
packages:
library(dplyr)
df %>%
group_by(grp = cumsum(x2 == 'start')) %>%
mutate(x3 = row_number())
library(data.table)
# option 1
setDT(df)[, x3 := rowid(cumsum(x2 == 'start'))][]
# option 2
setDT(df)[, x3 := 1:.N, by = cumsum(x2 == 'start')][]
Here is another base R method:
df$x3 <- sequence(diff(c(which(df$x2 == "start"), nrow(df)+1)))
which returns
df
x1 x2 x3
1 10 start 1
2 100 a 2
3 200 b 3
4 300 c 4
5 87 start 1
6 90 k 2
7 45 l 3
8 80 o 4
sequence
takes an integer vector and returns counts from 1 to each of the vector entries. It is fed the length of each count using diff
to calculate the difference of the positions of the start of each sequence. Because of this, we have to include the value of the position after the final row of the data.frame, nrow(df)+1
.