问题
I have to separate a line of text into words, and am confused on what regex to use. I have looked everywhere for a regex that matches a word and found ones similar to this post but want it in java (java doesn't handle \ in regular strings).
Regex to match words and those with an apostrophe
I have tried the regex for each answer and am unsure of how to structure a regex for java for this (i assumed all regex were the same). If replace \ by \ in the regex i see, the regex doesn't work.
I have also tried looking it up myself and have come to this page: http://www.regular-expressions.info/reference.html
But I cannot wrap my head around regex advanced techniques.
I am using String.split(regex string here) to separate my string. an example is if I'm given the following: "I like to eat but I don't like to eat everyone's food, or they'll starve." I want to match:
I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve
I also don't want to match '' or '''' or ' ' or '.'' or other permutations. My delimiter conditions should be similar to: [match any word character][also match an apostrophe if it is preceded by a word character and then match word characters after it if there are any]
What i got is just a simple regex that matches words [\w], but i am unsure of how to use lookahead or look behind to match the apostrophe and then the remaining words.
回答1:
Using answer from WhirlWind
on the page stated in my comment you can do the following:
String candidate = "I \n"+
"like \n"+
"to "+
"eat "+
"but "+
"I "+
"don't "+
"like "+
"to "+
"eat "+
"everyone's "+
"food "+
"'' '''' '.' ' "+
"or "+
"they'll "+
"starv'e'";
String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
System.out.println("> matched: `" + matcher.group() + "`");
}
It will print:
> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`
You can find a running example here: http://ideone.com/pVOmSK
回答2:
The following regex seems to cover your sample string correctly. But it doesn't cover you scenario for the apostrophe.
[\s,.?!"]+
Java Code:
String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("[\\s,.?!]+");
If I understand correctly, the apostrophe should be left alone as long as it is after a word character. This next regex should cover the above plus the special case for the apostrophe.
(?<!\w)'|[\s,.?"!][\s,.?"'!]*
Java Code:
String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("(?<!\\w)'|[\\s,.?\"!][\\s,.?\"'!]*");
If I run the second regex on the string: Hey there! Don't eat 'the mystery meat'.
I get the following words in my string array:
Hey
there
Don't
eat
the
mystery
meat'
来源:https://stackoverflow.com/questions/13632679/match-a-word-using-regex-that-also-handles-apostrophes