Regex for rectangle brackets in R

前端 未结 3 359
滥情空心
滥情空心 2021-01-18 13:04

Conventionally in R one can use metacharacters in a regex with two slashes, e.g. ( becomes \\(, but I find the same isn\'t true for square brackets.

mystring         


        
相关标签:
3条回答
  • 2021-01-18 13:43

    You should enable perl = TRUE, then you can use Perl-like syntax which is more straight-forward (IMHO):

    gsub("[\\[\\]$]","",mystring, perl = TRUE)
    

    Or, you may use "smart placement" when placing ] at the start of the bracket expression ([ is not special inside it, there is no need escaping [ there):

    gsub("[][$]","",mystring)
    

    See demo

    Result:

    [1] "abcde"
    

    More details

    The [...] construct is considered a bracket expression by the TRE regex engine (used by default in base R regex functions - (g)sub, grep(l), (g)regexpr - when used without perl=TRUE), which is a POSIX regex construct. Bracket expressions, unlike character classes in NFA regex engines, do not support escape sequences, i.e. the \ char is treated as a a literal backslash char inside them.

    Thus, the [\[\]] in a TRE regex matches \ or [ char (with the [\[\] part that is actually equal to [\[]) and then a ]. So, it matches \] or [] substrings, just have a look at gsub("[\\[\\]]", "", "[]\\]ab]") demo - it outputs ab] because [] and \] are matched and eventually removed.

    Note that the terms POSIX bracket expressions and NFA character classes are used in the same meaning as is used at https://www.regular-expressions.info, it is not quite a standard, but there is a need to differentiate between the two.

    0 讨论(0)
  • 2021-01-18 13:45

    I would sidestep [ab] syntax and use (a|b). Besides working, it may also be more readable:

    gsub("(\\[|\\]|\\$)","",mystring)
    
    0 讨论(0)
  • 2021-01-18 13:48

    You can just use \\[ as the thing to match, you don't need additional square brackets unless you are matching multiple options:

    > mystring <- 'abc[de'
    > gsub("\\[", "", mystring)
    [1] "abcde"
    

    You can make this even simpler and faster for single characters by taking away the special meaning using fixed=TRUE:

    > mystring <- 'abc[de'
    > gsub("[", "", mystring, fixed=TRUE)
    [1] "abcde"
    

    Or if the first thing inside of square brackets is square brackets (unescaped), then they are taken as the literal character rather than having the usual special meaning:

    > mystring <- 'a,bc[d]e$'
    > gsub("[][,$]", "", mystring)
    [1] "abcde"
    
    0 讨论(0)
提交回复
热议问题