问题
This is continued from last question R, how to group by row value? Split?
The change in input Dataframe is
id = str_c("x",1:22)
val = c(rep("NO1", 2), "START", rep("yes1", 2), "STOP", "NO",
"START","NO1", "START", rep("yes2", 3), "STOP", "NO1",
"START", rep("NO3",3), "STOP", "NO1", "STOP")
data = data.frame(id,val)
Expected output is dataframe with val column as follows-
val = c("START", rep("yes1", 2), "STOP",
"START","NO1", "START", rep("yes2", 3), "STOP",
"START", rep("NO3",3), "STOP", "NO1", "STOP")
回答1:
Simply speaking, if we remove all the other entries that are neither START nor STOP, then, a START is a valid start point if it is the first START or preceded by a STOP; similarly, a STOP is a valid endpoint if it is the last STOP or succeeded by a START. Consider this function:
valid_anchors <- function(x) {
are_anchors <- x %in% c("START", "STOP")
id <- seq_along(x)[are_anchors]
x <- x[are_anchors]
start_pos <- which(x == "START" & c("", head(x, -1L)) %in% c("", "STOP"))
stop_pos <- which(x == "STOP" & c(tail(x, -1L), "") %in% c("", "START"))
list(id[start_pos], id[stop_pos])
}
Then just apply the same function you got in your last post
ind <- valid_anchors(data$val)
data[sort(unique(unlist(mapply(`:`, ind[[1]], ind[[2]])))), ]
Output
id val
3 x3 START
4 x4 yes1
5 x5 yes1
6 x6 STOP
8 x8 START
9 x9 NO1
10 x10 START
11 x11 yes2
12 x12 yes2
13 x13 yes2
14 x14 STOP
16 x16 START
17 x17 NO3
18 x18 NO3
19 x19 NO3
20 x20 STOP
21 x21 NO1
22 x22 STOP
来源:https://stackoverflow.com/questions/64619727/r-how-to-group-by-split-or-subset-by-row-values