Regex lazy quantifier behave greedy

后端 未结 2 1147
逝去的感伤
逝去的感伤 2020-11-29 13:50

I have a text like this;

[Some Text][1][Some Text][2][Some Text][3][Some Text][4]

I want to match [Some Text][2] with this regex;

相关标签:
2条回答
  • 2020-11-29 14:27

    You could try the below regex,

    (?!^)(\[[A-Z].*?\]\[\d+\])    
    

    DEMO

    0 讨论(0)
  • 2020-11-29 14:29

    The \[.*?\]\[2\] pattern works like this:

    • \[ - finds the leftmost [ (as the regex engine processes the string input from left to right)
    • .*? - matches any 0+ chars other than line break chars, as few as possible, but as many as needed for a successful match, as there are subsequent patterns, see below
    • \]\[2\] - ][2] substring.

    So, the .*? gets expanded upon each failure until it finds the leftmost ][2]. Note the lazy quantifiers do not guarantee the "shortest" matches.

    Solution

    Instead of a .*? (or .*) use negated character classes that match any char but the boundary char.

    \[[^\]\[]*\]\[2\]
    

    See this regex demo.

    Here, .*? is replaced with [^\]\[]* - 0 or more chars other than ] and [.

    Other examples:

    • <[^<>]*> matches <...> with no < and > inside
    • \([^()]*\) matches (...) with no ( and ) inside
    • "[^"]*" matches "..." with no " inside

    In other situations, when the starting pattern is a multichar string or complex pattern, use a tempered greedy token, (?:(?!start).)*?. To match abc 1 def in abc 0 abc 1 def, use abc(?:(?!abc).)*?def.

    0 讨论(0)
提交回复
热议问题