Python RegEx that matches char followed/preceded by same char but uppercase/lowercase

前端未结

关注

 2  1474

野的像风

I am trying to build a regex which will find : aA, Aa, bB, cC but won\'t fit to : aB, aa, AA, aC, Ca.

-if we meed lowercase letter we want to check

相关标签:

2条回答

我寻月下人不归

2020-12-20 07:32
You may do it with PyPi regex module (note it will work with Java, PCRE (PHP, R, Delphi), Perl, .NET, but won't work with ECMAScript (JavaScript, C++ std::regex), RE2 (Go, Google Apps Script)) using
```
(\p{L})(?!\1)(?i:\1)
```
See the regex demo and a proof it works in Python:
```
import regex
rx = r'(\p{L})(?!\1)(?i:\1)'
print([x.group() for x in regex.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca')])
# => ['aA', 'Aa', 'bB', 'cC']
```
The solution is based on the inline modifier group (?i:...) inside which all chars are treated in a case insensitive way while other parts are case sensitive (granted there are no other (?i) or re.I).

Details
- (\p{L}) - any letter captured into Group 1
- (?!\1) - a negative lookahead that fails the match if the next char is absolutely identical to the one captured in Group 1 - note that the regex index is still right after the char captured with (\p{L})
- (?i:\1) - a case insensitive modifier group that contains a backreference to the value of Group 1 but since it matches it in a case insensitive way it could match both a and A - BUT the preceding lookahead excludes the variant with the alternate case (since the preceding \1 matched in a case sensitive way).
What about a re solution?

In re, you cannot make part of a pattern optional as (?i) in any part of a pattern makes all of it case insensitive. Besides, re does not support modifier groups.

You may use something like
```
import re
rx = r'(?i)([^\W\d_])(\1)'
print([x.group() for x in re.finditer(rx, ' aA, Aa, bB, cC but not aB, aa, AA, aC, Ca') if x.group(1) != x.group(2)])
```
See the Python demo.
- (?i) - set the whole regex case insensitive
- ([^\W\d_]) - a letter is captured into Group 1
- (\1) - the same letter is captured into Group 2 (case insensitive, so Aa, aA, aa and AA will match).
The if x.group(1) != x.group(2) condition filters out the unwanted matches.
0 讨论(0)
发布评论:

提交评论
- 加载中...

说谎

2020-12-20 07:36

This can be done with re:

import re
import string

pattern = re.compile('|'.join([''.join(i) for i in zip(list(string.ascii_lowercase), list(string.ascii_uppercase))])
pattern.search(your_text)

If you're looking for a repeated letter that switches case (either lower to upper or upper to lower), then you can use:

pattern = '|'.join([''.join(i) for i in zip(list(string.ascii_uppercase), list(string.ascii_lowercase))] + [''.join(i) for i in zip(list(string.ascii_lowercase), list(string.ascii_uppercase))])

0 讨论(0)