How do I deal with special characters like \^$.?*|+()[{ in my regex?

前端 未结 2 1296
花落未央
花落未央 2020-11-21 04:36

I want to match a regular expression special character, \\^$.?*|+()[{. I tried:

x <- \"a[b\"
grepl(\"[\", x)
## Error: invalid regular expre         


        
相关标签:
2条回答
  • 2020-11-21 05:24

    I think the easiest way to match the characters like

    \^$.?*|+()[
    

    are using character classes from within R. Consider the following to clean column headers from a data file, which could contain spaces, and punctuation characters:

    > library(stringr)
    > colnames(order_table) <- str_replace_all(colnames(order_table),"[:punct:]|[:space:]","")
    

    This approach allows us to string character classes to match punctation characters, in addition to whitespace characters, something you would normally have to escape with \\ to detect. You can learn more about the character classes at this cheatsheet below, and you can also type in ?regexp to see more info about this.

    https://www.rstudio.com/wp-content/uploads/2016/09/RegExCheatsheet.pdf

    0 讨论(0)
  • 2020-11-21 05:25

    Escape with a double backslash

    R treats backslashes as escape values for character constants. (... and so do regular expressions. Hence the need for two backslashes when supplying a character argument for a pattern. The first one isn't actually a character, but rather it makes the second one into a character.) You can see how they are processed using cat.

    y <- "double quote: \", tab: \t, newline: \n, unicode point: \u20AC"
    print(y)
    ## [1] "double quote: \", tab: \t, newline: \n, unicode point: €"
    cat(y)
    ## double quote: ", tab:    , newline: 
    ## , unicode point: €
    

    Further reading: Escaping a backslash with a backslash in R produces 2 backslashes in a string, not 1

    To use special characters in a regular expression the simplest method is usually to escape them with a backslash, but as noted above, the backslash itself needs to be escaped.

    grepl("\\[", "a[b")
    ## [1] TRUE
    

    To match backslashes, you need to double escape, resulting in four backslashes.

    grepl("\\\\", c("a\\b", "a\nb"))
    ## [1]  TRUE FALSE
    

    The rebus package contains constants for each of the special characters to save you mistyping slashes.

    library(rebus)
    OPEN_BRACKET
    ## [1] "\\["
    BACKSLASH
    ## [1] "\\\\"
    

    For more examples see:

    ?SpecialCharacters
    

    Your problem can be solved this way:

    library(rebus)
    grepl(OPEN_BRACKET, "a[b")
    

    Form a character class

    You can also wrap the special characters in square brackets to form a character class.

    grepl("[?]", "a?b")
    ## [1] TRUE
    

    Two of the special characters have special meaning inside character classes: \ and ^.

    Backslash still needs to be escaped even if it is inside a character class.

    grepl("[\\\\]", c("a\\b", "a\nb"))
    ## [1]  TRUE FALSE
    

    Caret only needs to be escaped if it is directly after the opening square bracket.

    grepl("[ ^]", "a^b")  # matches spaces as well.
    ## [1] TRUE
    grepl("[\\^]", "a^b") 
    ## [1] TRUE
    

    rebus also lets you form a character class.

    char_class("?")
    ## <regex> [?]
    

    Use a pre-existing character class

    If you want to match all punctuation, you can use the [:punct:] character class.

    grepl("[[:punct:]]", c("//", "[", "(", "{", "?", "^", "$"))
    ## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE
    

    stringi maps this to the Unicode General Category for punctuation, so its behaviour is slightly different.

    stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "[[:punct:]]")
    ## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
    

    You can also use the cross-platform syntax for accessing a UGC.

    stri_detect_regex(c("//", "[", "(", "{", "?", "^", "$"), "\\p{P}")
    ## [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE
    

    Use \Q \E escapes

    Placing characters between \\Q and \\E makes the regular expression engine treat them literally rather than as regular expressions.

    grepl("\\Q.\\E", "a.b")
    ## [1] TRUE
    

    rebus lets you write literal blocks of regular expressions.

    literal(".")
    ## <regex> \Q.\E
    

    Don't use regular expressions

    Regular expressions are not always the answer. If you want to match a fixed string then you can do, for example:

    grepl("[", "a[b", fixed = TRUE)
    stringr::str_detect("a[b", fixed("["))
    stringi::stri_detect_fixed("a[b", "[")
    
    0 讨论(0)
提交回复
热议问题