问题
I posted this question earlier.
But that wasn't quite the end of it. All the rules that applied there still apply.
So the strings:
"%ABC%"
would yield ABC as a result (capture stuff between percent signs)- as would
"$ABC."
(capture stuff after $, giving up when another dollar or dot appears) "$ABC$XYZ"
would too, and also give XYZ as a result.
To add a bit more to this:
"${ABC}"
should yield ABC too. (ignore curly braces if present - non capture chars perhaps?).- if you have two successive dollar signs, such as
"$$EFG"
, or"$${EFG}"
,
that should not appear in a regex result. (This is where either numbered or named back- references come into play - and the reason I contemplated them as non-capture groups). As I understand it, a group becomes a non-capture group with this syntax(?:)
.
1) Can I say the % or $ is a non-capture group and reference that by number? Or do only capture groups get allocated numbers?
2) What is the order of the numbering, if you have ((A) (B) (C))
. Is the outer group 1, A 2, B 3 C 4?
I have been look at named groups. Saw the syntax mentioned here
(?<name>capturing text)
to define a named group "name"
\k<name>
to backreference a named group "name"
3) Not sure if a non-capture group can be named in Java? Can someone elucidate?
- More info here on non capture groups.
- More info here on lookbehinds
- Similar answer to a question here, but didn't quite get me what I wanted. Not sure if there is a back-reference issue in Java.
- Similar question here. But could not get my head around the working version to apply to this.
I have used the exact same Java I had in my original question, except for:
String search = "/bla/$V_N.$$XYZ.bla";
String pattern = "(?:(?<oc>[%$]))(?!(\\k<oc>))([^%.$]*)+";
This should only result in V_N.
I am really struggling with this one, and wondered if someone can help me work out how to solve this. Thanks.
回答1:
You may write a little bit more verbose regex with multiple capturing groups and only grab those that are not null
, or plainly concatenate the found group values since there will be always only one of them initialized upon each match:
%([^%.]+)%|(?<!\$)\$(?:\{([^{}]+)\}|([^$.]+))
See the regex demo.
Details
%([^%.]+)%
-%
, Group 1: one or more chars other than%
and.
, then a%
is consumed|
- or(?<!\$)
- a negative lookbehind that matches a location in string that is not immediately preceded with$
\$
- a$
(?:
- start of the non-capturing container group matching either of:\{([^{}]+)\}
-{
, Group 2: any one or more chars other than{
and}
, then}
is consumed|
- or([^$.]+)
- Group 3: 1 or more chars other than$
and.
)
- end of the non-capturing container group.
Java usage:
String regex = "%([^%.]+)%|(?<!\\$)\\$(?:\\{([^\\{}]+)\\}|([^$.\\s]+))";
String string = "%ABC%\n$ABC.\n$ABC$XYZ ${ABC}\n\n$$EFG $${EFG}.";
Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
Matcher m = pattern.matcher(string);
List<String> results = new ArrayList<>();
while (m.find()) {
results.add(Objects.toString(m.group(1),"") +
Objects.toString(m.group(2),"") +
Objects.toString(m.group(3),""));
}
System.out.println(results); // => [ABC, ABC, ABC, XYZ, ABC]
Mind that in regular Java string literals, \
should be escaped (i.e. \\
) to introduce a single literal backslash that is used as part of regex escapes.
来源:https://stackoverflow.com/questions/58827094/java-regex-capture-string-with-single-dollar-but-not-when-it-has-two-successi