Unexpected match of regex

前端 未结 3 1899
执笔经年
执笔经年 2020-12-03 21:56

I expect the regex pattern ab{,2}c to match only with a followed by 0, 1 or 2 bs, followed by c.

It works that wa

相关标签:
3条回答
  • 2020-12-03 22:09

    The behavior with {,2} is not expected, it is a bug. If you have a look at the TRE source code, tre_parse_bound method, you will see that the min variable value is set to -1 before the engine tries to initialize the minimum bound. It seems that the number of "repeats" in case the minimum value is missing in the quantifier is the number of maximum value + 1 (as if the repeat number equals max - min = max - (-1) = max+1).

    So, a{,} matches one occurrence of a. Same as a{, } or a{ , }. See R demo, only abc is matched with ab{,}c:

    grepl("ab{,}c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
    grepl("ab{, }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
    grepl("ab{ ,   }c", c("ac", "abc", "abbc", "abbbc", "abbbbc"))
    ## => [1] FALSE  TRUE FALSE FALSE FALSE
    
    0 讨论(0)
  • 2020-12-03 22:11

    Just as an addition:

    vec1 = c('','a', 'aa', 'aaa', 'aaaa', 'aaaaa', 'aaaaaa','aaaaaaa')
    
    grep("^a{,1}$", vec1, value = T) # seems to "become" ^a{1}$
    grep("^a{,2}$", vec1, value = T) # seems to "become" ^a{0,3}$
    grep("^a{,3}$", vec1, value = T) # seems to "become" ^a{0,4}$
    grep("^a{,4}$", vec1, value = T) # seems to "become" ^a{0,5}$
    
    0 讨论(0)
  • 2020-12-03 22:17

    I am writing this as an answer, because unfortunately I cant add a comment.

    Update: Following the answer by Wiktor Stribiżew and feedback, seems the behavior is categories as a bug.

    Original: The syntax you are using is just not supported in R (assuming the default engine). This is why you are getting unexpected results.

    • The supported syntax is {n,m} as the documentation states. Thus, you need to specify both boundaries, e.g. {0,2}, which will return the correct result.
    • The syntax {,m}, on the other hand, is missing from the documentation to regex, which silently indicates that it is not supported.

    In case you would like to explore differences in syntax, I would recommend taking a look at the regular-expressions.info comparison page. (You need to compare Python and R in terms of Quantifiers in this case.)

    0 讨论(0)
提交回复
热议问题