I have a data frame with four columns: user_id, event, and time
User_id a user_id, event is either \"A\" or \"B\", and time is time. I need to count the number of \"B\"
Using @r2Evans' data :
x$y <- NA
which_ <- which(x$event=="A")
x$y[which_] <- diff(c(0,which_))-1
# user_id event date_time desired_column y
# 1 1 B 2018-01-01 NA NA
# 2 1 B 2018-01-02 NA NA
# 3 1 B 2018-01-03 NA NA
# 4 1 B 2018-01-04 NA NA
# 5 1 B 2018-01-05 NA NA
# 6 1 A 2018-01-06 5 5
# 7 1 B 2018-01-07 NA NA
# 8 1 A 2018-01-08 1 1
# 9 2 B 2018-01-05 NA NA
# 10 2 B 2018-01-06 NA NA
# 11 2 A 2018-01-07 2 2
x <- read.table(header=TRUE, stringsAsFactors=FALSE, text='
user_id event date_time desired_column
1 B 2018-01-01 NA
1 B 2018-01-02 NA
1 B 2018-01-03 NA
1 B 2018-01-04 NA
1 B 2018-01-05 NA
1 A 2018-01-06 5
1 B 2018-01-07 NA
1 A 2018-01-08 1
2 B 2018-01-05 NA
2 B 2018-01-06 NA
2 A 2018-01-07 2')
Perhaps a little clunky, but ...
(edit: specified dplyr::lag
, since stats::lag
doesn't do what we need.)
x$a <- NA
x$a[cumsum(rle(x$event)$lengths)] <- rle(x$event)$lengths
x$a <- dplyr::lag(x$a)
x$a[x$event == "B"] <- NA
x
# user_id event date_time desired_column a
# 1 1 B 2018-01-01 NA NA
# 2 1 B 2018-01-02 NA NA
# 3 1 B 2018-01-03 NA NA
# 4 1 B 2018-01-04 NA NA
# 5 1 B 2018-01-05 NA NA
# 6 1 A 2018-01-06 5 5
# 7 1 B 2018-01-07 NA NA
# 8 1 A 2018-01-08 1 1
# 9 2 B 2018-01-05 NA NA
# 10 2 B 2018-01-06 NA NA
# 11 2 A 2018-01-07 2 2