POSIX character class does not work in base R regex

前端 未结 1 1959
后悔当初
后悔当初 2020-11-28 16:06

I\'m having some problems matching a pattern with a string of text in R.

I\'m trying to get TRUE with grepl when the text is s

相关标签:
1条回答
  • 2020-11-28 16:46

    Although stringr ICU regex engines supports bare POSIX character classes in the pattern, in base R regex flavors (both PCRE (perl=TRUE) and TRE), POSIX character classes must be inside bracket expressions. [:alnum:] -> [[:alnum:]].

    x <- c("AZaz09 y AZaz09", "ĄŻaz09 y AZŁł09", "26 de Marzo y Pareyra de la Luz")
    grepl("[[:alnum:][:blank:]]+[[:blank:]][yY][[:blank:]][[:alnum:][:blank:]]+", x)
    ## => [1] TRUE TRUE TRUE
    grepl("[[:alnum:][:blank:]]+[[:blank:]][yY][[:blank:]][[:alnum:][:blank:]]+", x, perl=TRUE)
    ## => [1] TRUE TRUE TRUE
    

    See the online demo

    When you use [:alnum:] alone, it is a simple bracket expression that matches a single character, a :, a, l, n, u, m.

    Pattern details:

    • [[:alnum:][:blank:]]+ - 1+ alphanumeric or horizontal whitespace symbols
    • [[:blank:]] - 1 horizontal whitespace symbols
    • [yY] - either y or Y
    • [[:blank:]] - 1 horizontal whitespace symbols
    • [[:alnum:][:blank:]]+ - 1+ alphanumeric or horizontal whitespace symbols
    0 讨论(0)
提交回复
热议问题