`regex{n,}?` == `regex{n}`?

后端 未结 7 1848
灰色年华
灰色年华 2021-01-20 15:06

-edit- NOTE the ? at the end of .{2,}?

I found out you can write

.{2,}?

Isnt that exactly the same as bel

相关标签:
7条回答
  • 2021-01-20 15:18

    Not exactly Using PHP to do a regexp match and display the capture

    $string = 'aaabbaabbbaaa';
    
    $search = preg_match_all('/b{2}a/',$string,$matches,PREG_SET_ORDER );
    
    echo '<pre>';
    var_dump($matches);
    echo '</pre>';
    
    $search = preg_match_all('/b{2,}?a/',$string,$matches,PREG_SET_ORDER );
    
    echo '<pre>';
    var_dump($matches);
    echo '</pre>';
    

    First result gives:

    array(2) {
      [0]=>
      array(1) {
        [0]=>
        string(3) "bba"
      }
      [1]=>
      array(1) {
        [0]=>
        string(3) "bba"
      }
    }
    

    second gives:

    array(2) {
      [0]=>
      array(1) {
        [0]=>
        string(3) "bba"
      }
      [1]=>
      array(1) {
        [0]=>
        string(4) "bbba"
      }
    }
    

    With b{2} the capture only returns 2 b's, with b{2,} it returns 2 or more

    0 讨论(0)
  • 2021-01-20 15:25

    What makes this question especially interesting is that there are times when .{2,}? is equivalent to .{2}, but it should never happen. Others have already pointed out how a reluctant quantifier at the very end of a regex always matches the minimum number of of characters because there's nothing after it to force it to consume more.

    The other place they shouldn't be used is at the end of a subexpression inside an atomic group. For example, suppose you try to match foo bar with

    f(?>.+?) bar
    

    The subexpression initially consumes the first 'o' and hands off to the next part, which tries unsuccessfully to match a space. Without the atomic group, it would backtrack and let the .+? consume another character. But it can't backtrack into the atomic group, and there's no wiggle room before the group, so the match attempt fails.

    A reluctant quantifier at the end of a regex or at end of an atomic subexpression is definite code smell.

    0 讨论(0)
  • 2021-01-20 15:29

    x.{2,}?x matches "xasdfx" in "xasdfxbx" but x.{2}x does not match at all.

    Without the trailing ?, the first one will match the whole string.

    0 讨论(0)
  • 2021-01-20 15:34

    No, they are different :

    .{2,}? : Any character, at least 2 repetitions, as few as possible

    .{2} : Any character, exactly 2 repetitions

    0 讨论(0)
  • 2021-01-20 15:37

    No, they are different. ^.{2,}?$ matches strings whose length is at least 2 (as seen on rubular.com):

    12
    123
    1234
    

    By contrast, ^.{2}$ only matches strings whose length is exactly 2 (as seen on rubular.com).

    It's correct that being reluctant, .{2,}? will first attempt to match only two characters. But for the overall pattern to match, it can take more. This is not the case with .{2}, which can only match exactly 2 characters.

    References

    • regular-expressions.info/Repetition

    Related questions

    • Difference between .*? and .* for regex
    0 讨论(0)
  • 2021-01-20 15:38

    In isolation they probably behave identical but not inside larger expressions because the lazy version is allowed to match more than two symbols.

                 abx        abcx
    
    ^.{2,}?x$    match      match
    ^.{2}x$      match      no match
    
    0 讨论(0)
提交回复
热议问题