why adding a space after `(.+?)` can completely change the result

坚强是说给别人听的谎言 提交于 2019-12-10 22:13:42

问题


I'm trying to find a smaller string, String patternString1 = "(John) (.+?)";, within a larger string. The smaller string are consist of two groups i.e. (John) (.+?). However, I have obtained completely different result just by adding a space after (.+?).

for String patternString1 = "(John) (.+?)";, (i.e. without space), the result is

found: John w
found: John D
found: John W

For String patternString1 = "(John) (.+?) ";, (i.e. with space), the result is

found: John writes
found: John Doe
found: John Wayne

How come a space can make such a big difference to the result?

String text
        = "John writes about this, and John Doe writes about that,"
        + " and John Wayne writes about everything.";

String patternString1 = "(John) (.+?)";
Pattern pattern = Pattern.compile(patternString1);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
    System.out.println("found: " + matcher.group(1) + " " + matcher.group(2));
}

回答1:


The .+? quantifier is reluctant (or "lazy"). It means it will match the subpattern it quantifies one or more times, but as few times as necessary to return a valid match.

You have (John) (.+?) pattern and you try to find a match in John writes about this. The regex engine finds John, places it into Group 1 memory buffer, finds a space, matches it, and then finds w in writes. The w is matched, so the requirement of one or more is met. Since the match is already valid, it is returned. You get John w.

Now, you add a space after (.+?). The John is matched and captured into Group 1 as before, the space is matched with the space in the pattern (again, as before), then .+? is executed - finds an empty location before writes. It matches this location and goes on to match a space. There is no space at that location, since there is w. The regex engine goes back to .+? and consumes w. Checks if r is a space - no, it is not. The engine checks the string this way up to the first matching space and finds it right after writes. Thus, your valid match for (John) (.+?) is John writes .




回答2:


Well, if you include the trailing space, you are asking the pattern to match that space as well.

John w does not match anymore, because it does not end with a space.

It has to be expanded to John writes (note that the match includes the space at the end).



来源:https://stackoverflow.com/questions/35761836/why-adding-a-space-after-can-completely-change-the-result

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!