问题
I have next code:
public static void createTokens(){
String test = "test is a word word word word big small";
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+?\\s*)").matcher(test);
while (mtch.find()){
for (int i = 1; i <= mtch.groupCount(); i++){
System.out.println(mtch.group(i));
}
}
}
And have next output:
word
w
But in my opinion it must be:
word
word
Somebody please explain me why so?
回答1:
Because your patterns are non-greedy, so they matched as little text as possible while still consisting of a match.
Remove the ? in the second group, and you'll get
word
word word big small
Matcher mtch = Pattern.compile("test is a (\\s*.+?\\s*) word (\\s*.+\\s*)").matcher(test);
回答2:
By using \\s*
it will match any number of spaces including 0 spaces. w
matches (\\s*.+?\\s*)
. To make sure it matches a word separated by spaces try (\\s+.+?\\s+)
来源:https://stackoverflow.com/questions/8931183/non-greedy-regular-expression-in-java