问题
I'm trying to select 100 rows before and after a marker in a relatively large dataframe. The markers are sparse and for some reason I haven't been able to figure it out or find a solution - this doesn't seem like it should be that hard, so I'm probably missing something obvious.
Here's a very small simple example of what the data looks like:
timestamp talking_yn transition_yn
0.01 n n
0.02 n n
0.03 n n
0.04 n n
0.05 n n
0.06 n n
0.07 n n
0.08 n n
0.09 n n
0.10 n n
0.11 y y
0.12 y n
0.13 y n
0.14 y n
0.15 y n
0.16 y n
0.17 y n
0.18 y n
I've tried using different methods from a variety of answers (lag
from zoo
or dplyr
), but they all focus on selecting one row or subsetting only those rows with the marker. For the dummy example data, how would I select the 5 rows before and after the transition == 'y'
row?
回答1:
I have a quick function for that:
#' Lead/Lag a logical
#'
#' @param lgl logical vector
#' @param bef integer, number of elements to lead by
#' @param aft integer, number of elements to lag by
#' @return logical, same length as 'lgl'
#' @export
leadlag <- function(lgl, bef = 1, aft = 1) {
n <- length(lgl)
bef <- min(n, max(0, bef))
aft <- min(n, max(0, aft))
befx <- if (bef > 0) sapply(seq_len(bef), function(b) c(tail(lgl, n = -b), rep(FALSE, b)))
aftx <- if (aft > 0) sapply(seq_len(aft), function(a) c(rep(FALSE, a), head(lgl, n = -a)))
rowSums(cbind(befx, lgl, aftx), na.rm = TRUE) > 0
}
dat[leadlag(dat$transition_yn == 'y', 2, 4),]
# timestamp talking_yn transition_yn
# 9 0.09 n n
# 10 0.10 n n
# 11 0.11 y y
# 12 0.12 y n
# 13 0.13 y n
# 14 0.14 y n
# 15 0.15 y n
Data
dat <- read.table(header=TRUE, stringsAsFactor=FALSE, text="
timestamp talking_yn transition_yn
0.01 n n
0.02 n n
0.03 n n
0.04 n n
0.05 n n
0.06 n n
0.07 n n
0.08 n n
0.09 n n
0.10 n n
0.11 y y
0.12 y n
0.13 y n
0.14 y n
0.15 y n
0.16 y n
0.17 y n
0.18 y n")
来源:https://stackoverflow.com/questions/58716917/select-rows-around-a-marker-r