Regular Expression For Duplicate Words

后端 未结 13 1826
终归单人心
终归单人心 2020-11-22 11:13

I\'m a regular expression newbie, and I can\'t quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as

相关标签:
13条回答
  • 2020-11-22 11:56

    This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:

    /(^|\s+)(\S+)(($|\s+)\2)+/g, "$1$2")
    

    I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)

    First, I put (^|\s+) to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)), followed by an end of string ($) or a number of spaces (\s+), the whole repeated more than once.

    I tried it like this and it worked well:

    var s = "here here here     here is ahi-ahi ahi-ahi ahi-ahi joe's joe's joe's joe's joe's the result result     result";
    print( s.replace( /(\b\S+\b)(($|\s+)\1)+/g, "$1"))         
    --> here is ahi-ahi joe's the result
    
    0 讨论(0)
  • 2020-11-22 11:59

    The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):

    (\b\w+\b)\W+\1
    
    0 讨论(0)
  • 2020-11-22 11:59

    The example in Javascript: The Good Parts can be adapted to do this:

    var doubled_words = /([A-Za-z\u00C0-\u1FFF\u2800-\uFFFD]+)\s+\1(?:\s|$)/gi;
    

    \b uses \w for word boundaries, where \w is equivalent to [0-9A-Z_a-z]. If you don't mind that limitation, the accepted answer is fine.

    0 讨论(0)
  • 2020-11-22 12:08

    Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)

    Try this regex that can catch 2 or more duplicates words and only leave behind one single word. And the duplicate words need not even be consecutive.

    /\b(\w+)\b(?=.*?\b\1\b)/ig
    

    Here, \b is used for Word Boundary, ?= is used for positive lookahead, and \1 is used for back-referencing.

    Example Source

    0 讨论(0)
  • 2020-11-22 12:12

    Here is one that catches multiple words multiple times:

    (\b\w+\b)(\s+\1)+
    
    0 讨论(0)
  • 2020-11-22 12:13

    I believe this regex handles more situations:

    /(\b\S+\b)\s+\b\1\b/
    

    A good selection of test strings can be found here: http://callumacrae.github.com/regex-tuesday/challenge1.html

    0 讨论(0)
提交回复
热议问题