Match a word using regex that also handles apostrophes

吃可爱长大的小学妹 提交于 2019-12-01 19:35:22

Using answer from WhirlWind on the page stated in my comment you can do the following:

String candidate = "I \n"+
    "like \n"+
    "to "+
    "eat "+
    "but "+
    "I "+
    "don't "+
    "like "+
    "to "+
    "eat "+
    "everyone's "+
    "food "+
    "''  ''''  '.' ' "+
    "or "+
    "they'll "+
    "starv'e'";

String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
  System.out.println("> matched: `" + matcher.group() + "`");
}

It will print:

> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`

You can find a running example here: http://ideone.com/pVOmSK

The following regex seems to cover your sample string correctly. But it doesn't cover you scenario for the apostrophe.

[\s,.?!"]+

Java Code:

String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("[\\s,.?!]+");

If I understand correctly, the apostrophe should be left alone as long as it is after a word character. This next regex should cover the above plus the special case for the apostrophe.

(?<!\w)'|[\s,.?"!][\s,.?"'!]*

Java Code:

String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("(?<!\\w)'|[\\s,.?\"!][\\s,.?\"'!]*");

If I run the second regex on the string: Hey there! Don't eat 'the mystery meat'. I get the following words in my string array:

Hey
there
Don't
eat
the
mystery
meat'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!