Why does re.sub in Python not work correctly on this test case?

后端未结

关注

 4  1737

Try this code.

test = \' az z bz z z stuff z  z \'
re.sub(r\'(\\W)(z)(\\W)\', r\'\\1_\\2\\3\', test)

This should replace all stand-alone z\'s w

相关标签:

4条回答

南旧

2021-01-21 07:20
The reason why it does that is that you get an overlapping match; you need to not match the extra character - there are two ways you can do this; one is using \b, the word boundary, as suggested by others, the other is using a lookbehind assertion and a lookahead assertion. (If reasonable, as it should probably be, use \b instead of this solution. This is mainly here for educational purposes.)
```
>>> re.sub(r'(?<!\w)(z)(?!\w)', r'_\1', test)
' az _z bz _z _z stuff _z  _z '
```
(?<!\w) makes sure there wasn't \w before.

(?!\w) makes sure there isn't \w after.

The special (?...) syntax means they aren't groups, so the (z) is \1.

As for a graphical explanation of why it fails:

The regex is going through the string doing replacement; it's at these three characters:
```
' az _z bz z z stuff z  z '
          ^^^
```
It does that replacement. The final character has been acted upon, so its next step is approximately this:
```
' az _z bz _z z stuff z  z '
              ^^^ <- It starts matching here.
             ^ <- Not this character, it's been consumed by the last match
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2021-01-21 07:23
If your goal is to make sure you only match z when it's a standalone word, use \b to match word boundaries without actually consuming the whitespace:
```
>>> re.sub(r'\b(z)\b', r'_\1', test)
' az _z bz _z _z stuff _z  _z '
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2021-01-21 07:32
Use this:
```
test = ' az z bz z z stuff z  z '
re.sub(r'\b(z)\b', r'_\1', test)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-01-21 07:37
You want to avoid capturing the whitespace. Try using the 0-width word break \b, like this:
```
re.sub(r'\bz\b', '_z', test)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...