Regular Expression For Duplicate Words

后端 未结 13 1862
终归单人心
终归单人心 2020-11-22 11:13

I\'m a regular expression newbie, and I can\'t quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as

13条回答
  •  有刺的猬
    2020-11-22 11:52

    The below expression should work correctly to find any number of consecutive words. The matching can be case insensitive.

    String regex = "\\b(\\w+)(\\s+\\1\\b)*";
    Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
    
    Matcher m = p.matcher(input);
    
    // Check for subsequences of input that match the compiled pattern
    while (m.find()) {
         input = input.replaceAll(m.group(0), m.group(1));
    }
    

    Sample Input : Goodbye goodbye GooDbYe

    Sample Output : Goodbye

    Explanation:

    The regex expression:

    \b : Start of a word boundary

    \w+ : Any number of word characters

    (\s+\1\b)* : Any number of space followed by word which matches the previous word and ends the word boundary. Whole thing wrapped in * helps to find more than one repetitions.

    Grouping :

    m.group(0) : Shall contain the matched group in above case Goodbye goodbye GooDbYe

    m.group(1) : Shall contain the first word of the matched pattern in above case Goodbye

    Replace method shall replace all consecutive matched words with the first instance of the word.

提交回复
热议问题