Regular Expression For Duplicate Words

后端未结

关注

 13  1826

I\'m a regular expression newbie, and I can\'t quite figure out how to write a single regular expression that would "match" any duplicate consecutive words such as

相关标签:

13条回答

野趣味

2020-11-22 11:56
This expression (inspired from Mike, above) seems to catch all duplicates, triplicates, etc, including the ones at the end of the string, which most of the others don't:
```
/(^|\s+)(\S+)(($|\s+)\2)+/g, "$1$2")
```
I know the question asked to match duplicates only, but a triplicate is just 2 duplicates next to each other :)

First, I put (^|\s+) to make sure it starts with a full word, otherwise "child's steak" would go to "child'steak" (the "s"'s would match). Then, it matches all full words ((\b\S+\b)), followed by an end of string ($) or a number of spaces (\s+), the whole repeated more than once.

I tried it like this and it worked well:
```
var s = "here here here     here is ahi-ahi ahi-ahi ahi-ahi joe's joe's joe's joe's joe's the result result     result";
print( s.replace( /(\b\S+\b)(($|\s+)\1)+/g, "$1"))         
--> here is ahi-ahi joe's the result
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-11-22 11:59
The widely-used PCRE library can handle such situations (you won't achieve the the same with POSIX-compliant regex engines, though):
```
(\b\w+\b)\W+\1
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
遥遥无期

2020-11-22 11:59
The example in Javascript: The Good Parts can be adapted to do this:
```
var doubled_words = /([A-Za-z\u00C0-\u1FFF\u2800-\uFFFD]+)\s+\1(?:\s|$)/gi;
```
\b uses \w for word boundaries, where \w is equivalent to [0-9A-Z_a-z]. If you don't mind that limitation, the accepted answer is fine.
0 讨论(0)
发布评论:

提交评论
- 加载中...
旧巷少年郎

2020-11-22 12:08
Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)

Try this regex that can catch 2 or more duplicates words and only leave behind one single word. And the duplicate words need not even be consecutive.
```
/\b(\w+)\b(?=.*?\b\1\b)/ig
```
Here, \b is used for Word Boundary, ?= is used for positive lookahead, and \1 is used for back-referencing.

Example Source
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦谈多话

2020-11-22 12:12
Here is one that catches multiple words multiple times:
```
(\b\w+\b)(\s+\1)+
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2020-11-22 12:13
I believe this regex handles more situations:
```
/(\b\S+\b)\s+\b\1\b/
```
A good selection of test strings can be found here: http://callumacrae.github.com/regex-tuesday/challenge1.html
0 讨论(0)
发布评论:

提交评论
- 加载中...

Regular Expression For Duplicate Words

Regex to Strip 2+ duplicate words (consecutive/non-consecutive words)