I have to separate a line of text into words, and am confused on what regex to use. I have looked everywhere for a regex that matches a word and found ones similar to this post but want it in java (java doesn't handle \ in regular strings).
Regex to match words and those with an apostrophe
I have tried the regex for each answer and am unsure of how to structure a regex for java for this (i assumed all regex were the same). If replace \ by \ in the regex i see, the regex doesn't work.
I have also tried looking it up myself and have come to this page: http://www.regular-expressions.info/reference.html
But I cannot wrap my head around regex advanced techniques.
I am using String.split(regex string here) to separate my string. an example is if I'm given the following: "I like to eat but I don't like to eat everyone's food, or they'll starve." I want to match:
I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve
I also don't want to match '' or '''' or ' ' or '.'' or other permutations. My delimiter conditions should be similar to: [match any word character][also match an apostrophe if it is preceded by a word character and then match word characters after it if there are any]
What i got is just a simple regex that matches words [\w], but i am unsure of how to use lookahead or look behind to match the apostrophe and then the remaining words.
Using answer from WhirlWind
on the page stated in my comment you can do the following:
String candidate = "I \n"+
"like \n"+
"to "+
"eat "+
"but "+
"I "+
"don't "+
"like "+
"to "+
"eat "+
"everyone's "+
"food "+
"'' '''' '.' ' "+
"or "+
"they'll "+
"starv'e'";
String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
System.out.println("> matched: `" + matcher.group() + "`");
}
It will print:
> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`
You can find a running example here: http://ideone.com/pVOmSK
The following regex seems to cover your sample string correctly. But it doesn't cover you scenario for the apostrophe.
[\s,.?!"]+
Java Code:
String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("[\\s,.?!]+");
If I understand correctly, the apostrophe should be left alone as long as it is after a word character. This next regex should cover the above plus the special case for the apostrophe.
(?<!\w)'|[\s,.?"!][\s,.?"'!]*
Java Code:
String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("(?<!\\w)'|[\\s,.?\"!][\\s,.?\"'!]*");
If I run the second regex on the string: Hey there! Don't eat 'the mystery meat'.
I get the following words in my string array:
Hey
there
Don't
eat
the
mystery
meat'
来源:https://stackoverflow.com/questions/13632679/match-a-word-using-regex-that-also-handles-apostrophes