Python RegEx that matches char followed/preceded by same char but uppercase/lowercase

前端 未结 2 1474
野的像风
野的像风 2020-12-20 06:56

I am trying to build a regex which will find : aA, Aa, bB, cC but won\'t fit to : aB, aa, AA, aC, Ca.

-if we meed lowercase letter we want to check

相关标签:
2条回答
  • 2020-12-20 07:32

    You may do it with PyPi regex module (note it will work with Java, PCRE (PHP, R, Delphi), Perl, .NET, but won't work with ECMAScript (JavaScript, C++ std::regex), RE2 (Go, Google Apps Script)) using

    (\p{L})(?!\1)(?i:\1)
    

    See the regex demo and a proof it works in Python:

    import regex
    rx = r'(\p{L})(?!\1)(?i:\1)'
    print([x.group() for x in regex.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca')])
    # => ['aA', 'Aa', 'bB', 'cC']
    

    The solution is based on the inline modifier group (?i:...) inside which all chars are treated in a case insensitive way while other parts are case sensitive (granted there are no other (?i) or re.I).

    Details

    • (\p{L}) - any letter captured into Group 1
    • (?!\1) - a negative lookahead that fails the match if the next char is absolutely identical to the one captured in Group 1 - note that the regex index is still right after the char captured with (\p{L})
    • (?i:\1) - a case insensitive modifier group that contains a backreference to the value of Group 1 but since it matches it in a case insensitive way it could match both a and A - BUT the preceding lookahead excludes the variant with the alternate case (since the preceding \1 matched in a case sensitive way).

    What about a re solution?

    In re, you cannot make part of a pattern optional as (?i) in any part of a pattern makes all of it case insensitive. Besides, re does not support modifier groups.

    You may use something like

    import re
    rx = r'(?i)([^\W\d_])(\1)'
    print([x.group() for x in re.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca') if x.group(1) != x.group(2)])
    

    See the Python demo.

    • (?i) - set the whole regex case insensitive
    • ([^\W\d_]) - a letter is captured into Group 1
    • (\1) - the same letter is captured into Group 2 (case insensitive, so Aa, aA, aa and AA will match).

    The if x.group(1) != x.group(2) condition filters out the unwanted matches.

    0 讨论(0)
  • 2020-12-20 07:36

    This can be done with re:

    import re
    import string
    
    pattern = re.compile('|'.join([''.join(i) for i in zip(list(string.ascii_lowercase), list(string.ascii_uppercase))])
    pattern.search(your_text)
    

    If you're looking for a repeated letter that switches case (either lower to upper or upper to lower), then you can use:

    pattern = '|'.join([''.join(i) for i in zip(list(string.ascii_uppercase), list(string.ascii_lowercase))] + [''.join(i) for i in zip(list(string.ascii_lowercase), list(string.ascii_uppercase))])
    
    0 讨论(0)
提交回复
热议问题