Is there a way to remove duplicate and continuous words/phrases in a string?

前端 未结 6 402
广开言路
广开言路 2021-01-13 11:27

Is there a way to remove duplicate and continuous words/phrases in a string? E.g.

[in]: foo foo bar bar foo bar

6条回答
  •  旧巷少年郎
    2021-01-13 11:56

    You can use re module for that.

    >>> s = 'foo foo bar bar'
    >>> re.sub(r'\b(.+)\s+\1\b', r'\1', s)
    'foo bar'
    
    >>> s = 'foo bar foo bar foo bar'
    >>> re.sub(r'\b(.+)\s+\1\b', r'\1', s)
    'foo bar foo bar'
    

    If you want to match any number of consecutive occurrences:

    >>> s = 'foo bar foo bar foo bar'
    >>> re.sub(r'\b(.+)(\s+\1\b)+', r'\1', s)
    'foo bar'    
    

    Edit. An addition for your last example. To do so you'll have to call re.sub while there're duplicate phrases. So:

    >>> s = 'this is a sentence sentence sentence this is a sentence where phrases phrases duplicate where phrases duplicate'
    >>> while re.search(r'\b(.+)(\s+\1\b)+', s):
    ...   s = re.sub(r'\b(.+)(\s+\1\b)+', r'\1', s)
    ...
    >>> s
    'this is a sentence where phrases duplicate'
    

提交回复
热议问题