Remove duplicate chars using regex?

后端 未结 3 848
情书的邮戳
情书的邮戳 2020-11-30 06:17

Let\'s say I want to remove all duplicate chars (of a particular char) in a string using regular expressions. This is simple -

import re
re.sub(\"a*\", \"a\         


        
相关标签:
3条回答
  • 2020-11-30 06:37
    >>> import re
    >>> re.sub(r'([a-z])\1+', r'\1', 'ffffffbbbbbbbqqq')
    'fbq'
    

    The () around the [a-z] specify a capture group, and then the \1 (a backreference) in both the pattern and the replacement refer to the contents of the first capture group.

    Thus, the regex reads "find a letter, followed by one or more occurrences of that same letter" and then entire found portion is replaced with a single occurrence of the found letter.

    On side note...

    Your example code for just a is actually buggy:

    >>> re.sub('a*', 'a', 'aaabbbccc')
    'abababacacaca'
    

    You really would want to use 'a+' for your regex instead of 'a*', since the * operator matches "0 or more" occurrences, and thus will match empty strings in between two non-a characters, whereas the + operator matches "1 or more".

    0 讨论(0)
  • 2020-11-30 06:37

    In case you are also interested in removing duplicates of non-contiguous occurrences you have to wrap things in a loop, e.g. like this

     s="ababacbdefefbcdefde"
    
     while re.search(r'([a-z])(.*)\1', s):
         s= re.sub(r'([a-z])(.*)\1', r'\1\2', s)
    
     print s  # prints 'abcdef'
    
    0 讨论(0)
  • 2020-11-30 06:42

    A solution including all category:

    re.sub(r'(.)\1+', r'\1', 'aaaaabbbbbb[[[[[')
    

    gives:

    'ab['
    
    0 讨论(0)
提交回复
热议问题