Match a word using regex that also handles apostrophes

瘦欲@ 提交于 2019-12-19 09:01:23

问题


I have to separate a line of text into words, and am confused on what regex to use. I have looked everywhere for a regex that matches a word and found ones similar to this post but want it in java (java doesn't handle \ in regular strings).

Regex to match words and those with an apostrophe

I have tried the regex for each answer and am unsure of how to structure a regex for java for this (i assumed all regex were the same). If replace \ by \ in the regex i see, the regex doesn't work.

I have also tried looking it up myself and have come to this page: http://www.regular-expressions.info/reference.html

But I cannot wrap my head around regex advanced techniques.

I am using String.split(regex string here) to separate my string. an example is if I'm given the following: "I like to eat but I don't like to eat everyone's food, or they'll starve." I want to match:

I
like
to
eat
but
I
don't
like
to
eat
everyone's
food
or
they'll
starve

I also don't want to match '' or '''' or ' ' or '.'' or other permutations. My delimiter conditions should be similar to: [match any word character][also match an apostrophe if it is preceded by a word character and then match word characters after it if there are any]

What i got is just a simple regex that matches words [\w], but i am unsure of how to use lookahead or look behind to match the apostrophe and then the remaining words.


回答1:


Using answer from WhirlWind on the page stated in my comment you can do the following:

String candidate = "I \n"+
    "like \n"+
    "to "+
    "eat "+
    "but "+
    "I "+
    "don't "+
    "like "+
    "to "+
    "eat "+
    "everyone's "+
    "food "+
    "''  ''''  '.' ' "+
    "or "+
    "they'll "+
    "starv'e'";

String regex = "('\\w+)|(\\w+'\\w+)|(\\w+')|(\\w+)";
Matcher matcher = Pattern.compile(regex).matcher(candidate);
while (matcher.find()) {
  System.out.println("> matched: `" + matcher.group() + "`");
}

It will print:

> matched: `I`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `but`
> matched: `I`
> matched: `don't`
> matched: `like`
> matched: `to`
> matched: `eat`
> matched: `everyone's`
> matched: `food`
> matched: `or`
> matched: `they'll`
> matched: `starv'e`

You can find a running example here: http://ideone.com/pVOmSK




回答2:


The following regex seems to cover your sample string correctly. But it doesn't cover you scenario for the apostrophe.

[\s,.?!"]+

Java Code:

String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("[\\s,.?!]+");

If I understand correctly, the apostrophe should be left alone as long as it is after a word character. This next regex should cover the above plus the special case for the apostrophe.

(?<!\w)'|[\s,.?"!][\s,.?"'!]*

Java Code:

String input = "I like to eat but I don't like to eat everyone's food, or they'll starve.";
String[] inputWords = input.split("(?<!\\w)'|[\\s,.?\"!][\\s,.?\"'!]*");

If I run the second regex on the string: Hey there! Don't eat 'the mystery meat'. I get the following words in my string array:

Hey
there
Don't
eat
the
mystery
meat'


来源:https://stackoverflow.com/questions/13632679/match-a-word-using-regex-that-also-handles-apostrophes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!