Regex for rectangle brackets in R

前端 未结 3 358
滥情空心
滥情空心 2021-01-18 13:04

Conventionally in R one can use metacharacters in a regex with two slashes, e.g. ( becomes \\(, but I find the same isn\'t true for square brackets.

mystring         


        
3条回答
  •  孤城傲影
    2021-01-18 13:43

    You should enable perl = TRUE, then you can use Perl-like syntax which is more straight-forward (IMHO):

    gsub("[\\[\\]$]","",mystring, perl = TRUE)
    

    Or, you may use "smart placement" when placing ] at the start of the bracket expression ([ is not special inside it, there is no need escaping [ there):

    gsub("[][$]","",mystring)
    

    See demo

    Result:

    [1] "abcde"
    

    More details

    The [...] construct is considered a bracket expression by the TRE regex engine (used by default in base R regex functions - (g)sub, grep(l), (g)regexpr - when used without perl=TRUE), which is a POSIX regex construct. Bracket expressions, unlike character classes in NFA regex engines, do not support escape sequences, i.e. the \ char is treated as a a literal backslash char inside them.

    Thus, the [\[\]] in a TRE regex matches \ or [ char (with the [\[\] part that is actually equal to [\[]) and then a ]. So, it matches \] or [] substrings, just have a look at gsub("[\\[\\]]", "", "[]\\]ab]") demo - it outputs ab] because [] and \] are matched and eventually removed.

    Note that the terms POSIX bracket expressions and NFA character classes are used in the same meaning as is used at https://www.regular-expressions.info, it is not quite a standard, but there is a need to differentiate between the two.

提交回复
热议问题