发表新帖

发表新帖

Regular expression to replace “escaped” characters with their originals

后端未结

关注

 1  1120

囚心锁ツ 2021-01-24 11:45

NOTE: I\'m not parsing lots of or html or generic html with regex. I know that\'s bad

TL;DR:

I have strings like

1条回答

情歌与酒 (楼主)

2021-01-24 12:13
You are missing something, namely the r prefix:
```
r = re.compile(r"\\.") # Slash followed by anything
```
Both python and re attach meaning to \; your doubled backslash becomes just one backslash when you pass the string value to re.compile(), by which time re sees \., meaning a literal full stop.:
```
>>> print """\\."""
\.
```
By using r'' you tell python not to interpret escape codes, so now re is given a string with \\., meaning a literal backslash followed by any character:
```
>>> print r"""\\."""
\\.
```
Demo:
```
>>> import re
>>> s = "test \\* \\! test * !! **"
>>> r = re.compile(r"\\.") # Slash followed by anything
>>> r.sub("-", s)
'test - - test * !! **'
```
The rule of thumb is: when defining regular expressions, use r'' raw string literals, saving you to have to double-escape everything that has meaning to both Python and regular expression syntax.

Next, you want to replace the 'escaped' character; use groups for that, re.sub() lets you reference groups as the replacement value:
```
r = re.compile(r"\\(.)") # Note the parethesis, that's a capturing group
r.sub(r'\1', s)          # \1 means: replace with value of first capturing group
```
Now the output is:
```
>>> r = re.compile(r"\\(.)") # Note the parethesis, that's a capturing group
>>> r.sub(r'\1', s) 
'test * ! test * !! **'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题