R dplyr: Filter data by multiple Regex expressions defined by vector

后端 未结 1 1444
长发绾君心
长发绾君心 2021-01-20 14:16

I have a dataframe, from which I want to select important columns, and then filter the rows to contain specific ending.

Regex expression make it simple to define my

1条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-20 14:39

    You may use

    df %>% 
      select(x, y) %> filter(grepl(paste0("(?:", paste(ids, collapse="|"), ")$"), y))
    

    The paste0("(?:", paste(ids, collapse="|"), ")$") part will build an alternation pattern that will only match at the end of the string due to $ anchor at the end.

    NOTE: If the values can have special regex metacharacters you need to escape the values in the character vector first:

    regex.escape <- function(string) {
      gsub("([][{}()+*^$|\\\\?.])", "\\\\\\1", string)
    }
    df %>% 
          select(x, y) %> filter(grepl(paste0("(?:", paste(regex.escape(ids), collapse="|"), ")$"), y))
                                                           ^^^^^^^^^^^^^^^^^
    

    For example, paste0("(?:", paste(c("7", "8", "ids"), collapse="|"), ")$") will output (?:7|8|ids)$:

    • (?: - start of a non-capturing group that will act as a container for the alternatives, so that the $ anchor applied to all of them and not to just the last one, matching any of
      • 7 - a 7 char
    • | - or
    • 8 - an 8 char
    • | - or
    • ids - an ids substring
    • ) - end of the group
    • $ - end of the string.

    0 讨论(0)
提交回复
热议问题