Given a string containing some number of square brackets and other characters, I want to find all closing square brackets preceded by an opening square bracket and some number o
\[ is only a single character, so it seems like the obvious maximum length should be 1 + whatever the obvious maximum length was of the look-behind group in the first expression. What gives?
That's the point, "whatever the obvious maximum length was of the look-behind group in the first expression", is not obvious. A rule of fist is that you can't use +
or *
inside a look-behind. This is not only so for Java's regex engine, but for many more PCRE-flavored engines (even Perl's (v5.10) engine!).
You can do this with look-aheads however:
Pattern p = Pattern.compile("(?=(\\[[a-z]+]))");
Matcher m = p.matcher("] [abc] [123] abc]");
while(m.find()) {
System.out.println("Found a ']' before index: " + m.end(1));
}
(I.e. a capture group inside a look ahead (!) which can be used to get the end(...)
of the group)
will print:
Found a ']' before index: 7
And if you're interested in replacing such ]
's, you could do something like this:
String s = "] [abc] [123] abc] [foo] bar]";
System.out.println(s);
System.out.println(s.replaceAll("(\\[[a-z]+)]", "$1_"));
which will print:
] [abc] [123] abc] [foo] bar] ] [abc_ [123] abc] [foo_ bar]
"^[^\[]*\[[^\]]*?(\])"
is the group(1) what you want?