How to use re to find consecutive, repeated chars

后端 未结 3 511
醉梦人生
醉梦人生 2020-12-01 11:05

I want to find all consecutive, repeated character blocks in a string. For example, consider the following:

s = r\'http://www.google.com/search=ooo-jjj\'


        
相关标签:
3条回答
  • 2020-12-01 11:37

    It works almost right, just replace search with finditer. It returns an iterator, not a match but...:

    m = [(x.start(),x.end()) for x in re.finditer(r'(\w)\1\1', s)]
    
    0 讨论(0)
  • 2020-12-01 11:38

    The following code should solve your problem:

    s="abc def aaa bbb ccc def hhh"
    
    for match in re.finditer(r"(\w)\1\1", s):
        print s[match.start():match.end()]
    
    0 讨论(0)
  • 2020-12-01 11:51

    ((\w)\2{2,}) matches 3 or more consecutive characters:

    In [71]: import re
    In [72]: s = r'http://www.google.com/search=ooo-jjjj'
    In [73]: re.findall(r'((\w)\2{2,})', s)
    Out[73]: [('www', 'w'), ('ooo', 'o'), ('jjjj', 'j')]
    
    In [78]: [match[0] for match in re.findall(r'((\w)\2{2,})', s)]
    Out[78]: ['www', 'ooo', 'jjjj']
    

    (\w) matches any alphanumeric character.

    ((\w)\2) matches any alphanumeric character followed by the same character, since \2 matches the contents of group number 2. Since I nested the parentheses, group number 2 refers to the character matched by \w.

    Then putting it all together, ((\w)\2{2,}) matches any alphanumeric character, followed by the same character repeated 2 or more additional times.

    In total, that means the regex require the character to be repeated 3 or more times.

    0 讨论(0)
提交回复
热议问题