问题
I'm trying to find a smaller string, String patternString1 = "(John) (.+?)";
, within a larger string. The smaller string are consist of two groups i.e. (John) (.+?)
. However, I have obtained completely different result just by adding a space after (.+?)
.
for String patternString1 = "(John) (.+?)";
, (i.e. without space), the result is
found: John w
found: John D
found: John W
For String patternString1 = "(John) (.+?) ";
, (i.e. with space), the result is
found: John writes
found: John Doe
found: John Wayne
How come a space can make such a big difference to the result?
String text
= "John writes about this, and John Doe writes about that,"
+ " and John Wayne writes about everything.";
String patternString1 = "(John) (.+?)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println("found: " + matcher.group(1) + " " + matcher.group(2));
}
回答1:
The .+?
quantifier is reluctant (or "lazy"). It means it will match the subpattern it quantifies one or more times, but as few times as necessary to return a valid match.
You have (John) (.+?)
pattern and you try to find a match in John writes about this
. The regex engine finds John
, places it into Group 1 memory buffer, finds a space, matches it, and then finds w
in writes
. The w
is matched, so the requirement of one or more is met. Since the match is already valid, it is returned. You get John w
.
Now, you add a space after (.+?)
. The John
is matched and captured into Group 1 as before, the space is matched with the space in the pattern (again, as before), then .+?
is executed - finds an empty location before writes
. It matches this location and goes on to match a space. There is no space at that location, since there is w
. The regex engine goes back to .+?
and consumes w
. Checks if r
is a space - no, it is not. The engine checks the string this way up to the first matching space and finds it right after writes
. Thus, your valid match for (John) (.+?)
is John writes
.
回答2:
Well, if you include the trailing space, you are asking the pattern to match that space as well.
John w
does not match anymore, because it does not end with a space.
It has to be expanded to John writes
(note that the match includes the space at the end).
来源:https://stackoverflow.com/questions/35761836/why-adding-a-space-after-can-completely-change-the-result