Delete runs of certain value before and after specific value

北城余情 提交于 2020-07-09 08:32:18

问题


I have a data frame with several columns. Based on the column 'activity', I want to remove entire contiguous runs of a specific value, 'pt', but only when they occur immediately before or after a run of 'outside'.

In the simplified data below, there is one run where 'activity' is 'outside', and which have chunks of 'pt' before and after. These two 'pt' chunks should be removed.

   activity dist
1      home    1
2        pt    2 # <- run of 'pt' before run of 'outside': remove
3        pt    3 # <-
4        pt    4 # <- 
5   outside    5
6   outside    6
7        pt    7 # <- run of 'pt' after run of 'outside': remove
8        pt    8 # <-
9      work    9
10       pt   10
11       pt   11
12     home   12

Thus, the desired output is:

    activity dist 
 1      home    1 
 2   outside    5 
 3   outside    6 
 4      work    9 
 5        pt   10 
 6        pt   11 
 7      home   12 

How can this be achieved?


dput of data:

structure(list(activity = c("home", "pt", "pt", "pt", "outside", "outside", "pt", "pt", "work", "pt", "pt", "home"),
              dist = 1:12),
          class = "data.frame", row.names = c(NA, -12L))

回答1:


You may use some convenience functions from data.table package: rleid to "[g]enerate run-length type group id", and shift to get the values before and after the focal index in a vector.

library(data.table)
setDT(d)
d[ , r := rleid(activity)]

d[!(r %in% r[activity == "pt" & shift(activity, type = "lead") == "outside" |
               shift(activity) == "outside" & activity == "pt"])]

#    activity dist r
# 1:     home    1 1
# 2:  outside    5 3
# 3:  outside    6 3
# 4:     work    9 5
# 5:       pt   10 6
# 6:       pt   11 6
# 7:     home   12 7

Explanation:

Coerce your data.frame to a data.table (setDT(d)). Create run length index of 'activity' (rleid). Check if current value is 'pt' and next value is 'outside' (activity == "pt" & shift(activity, type = "lead") == "outside"), or (|) if current value is 'pt' and previous value is 'outside' (activity == "pt" & shift(activity) == "outside").

Where this condition is TRUE, grab the run groups to be removed (r[<condition>]). Check if run are in the groups to be removed (r %in% <run groups to be removed>). If so, do not (!) keep these rows when indexing the data (d[<condition>])


base alternative using rle.

The values of runs of 'pt' before or after 'outside' are replaced with NA. The rle is converted back to a vector (inverse.rle) and rows with NA are removed (na.omit).

Obviously, if there are rows with NA in the original data set which you want to keep, you need to use another value for replacement.

with(rle(d$activity),
     values[c(which(head(values, -1) == "pt" & tail(values, -1) == "outside"),
              which(head(values, -1) == "outside" & tail(values, -1) == "pt") + 1)]) <- NA

d$activity = inverse.rle(r)
na.omit(d)  


来源:https://stackoverflow.com/questions/62454188/delete-runs-of-certain-value-before-and-after-specific-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!