Why does the look-behind expression in this regex not have an “obvious maximum length”?

前端 未结 2 1966
太阳男子
太阳男子 2021-01-23 09:23

Given a string containing some number of square brackets and other characters, I want to find all closing square brackets preceded by an opening square bracket and some number o

2条回答
  •  一个人的身影
    2021-01-23 10:07

    \[ is only a single character, so it seems like the obvious maximum length should be 1 + whatever the obvious maximum length was of the look-behind group in the first expression. What gives?

    That's the point, "whatever the obvious maximum length was of the look-behind group in the first expression", is not obvious. A rule of fist is that you can't use + or * inside a look-behind. This is not only so for Java's regex engine, but for many more PCRE-flavored engines (even Perl's (v5.10) engine!).

    You can do this with look-aheads however:

    Pattern p = Pattern.compile("(?=(\\[[a-z]+]))");
    Matcher m = p.matcher("] [abc] [123] abc]");
    while(m.find()) {
      System.out.println("Found a ']' before index: " + m.end(1));
    }
    

    (I.e. a capture group inside a look ahead (!) which can be used to get the end(...) of the group)

    will print:

    Found a ']' before index: 7

    EDIT

    And if you're interested in replacing such ]'s, you could do something like this:

    String s = "] [abc] [123] abc] [foo] bar]";
    System.out.println(s);
    System.out.println(s.replaceAll("(\\[[a-z]+)]", "$1_"));
    

    which will print:

    ] [abc] [123] abc] [foo] bar]
    ] [abc_ [123] abc] [foo_ bar]

提交回复
热议问题