问题
I would like a Python regular expression that matches a given word that's not between simple quotes. I've tried to use the (?! ...)
but without success.
In the following screenshot, I would like to match all foe
except the one in the 4th line.
Plus, the text is given as one big string.
Here is the link regex101 and the sample text is below:
var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe
回答1:
A regex solution below will work in most cases, but it might break if the unbalanced single quotes appear outside of string literals, e.g. in comments.
A usual regex trick to match strings in-context is matching what you need to replace and match and capture what you need to keep.
Here is a sample Python demo:
import re
rx = r"('[^'\\]*(?:\\.[^'\\]*)*')|\b{0}\b"
s = r"""
var foe = 10;
foe = "";
dark_vador = 'bad guy'
foe = ' I\'m your father, foe ! '
bar = thingy + foe"""
toReplace = "foe"
res = re.sub(rx.format(toReplace), lambda m: m.group(1) if m.group(1) else 'NEWORD', s)
print(res)
See the Python demo
The regex will look like
('[^'\\]*(?:\\.[^'\\]*)*')|\bfoe\b
See the regex demo.
The ('[^'\\]*(?:\\.[^'\\]*)*')
part captures ingle-quoted string literals into Group 1 and if it matches, it is just put back into the result, and \bfoe\b
matches whole words foe
in any other string context - and subsequently is replaced with another word.
NOTE: To also match double quoted string literals, use r"('[^'\\]*(?:\\.[^'\\]*)*'|\"[^\"\\]*(?:\\.[^\"\\]*)*\")"
.
回答2:
You can try this:-
((?!\'[\w\s]*)foe(?![\w\s]*\'))
回答3:
How about this regular expression:
>>> s = '''var foe = 10;
foe = "";
dark_vador = 'bad guy'
' I\m your father, foe ! '
bar = thingy + foe'''
>>>
>>> re.findall(r'(?!\'.*)foe(?!.*\')', s)
['foe', 'foe', 'foe']
The trick here is to make sure the expression does not match any string with leading and trailing '
and to remember to account for the characters in between, thereafter .*
in the re expression.
回答4:
((?!\'[\w\s]*[\\']*[\w\s]*)foe(?![\w\s]*[\\']*[\w\s]*\'))
回答5:
Capture group 1 of the following regular expression will contain matches of 'foe'
.
r'^(?:[^'\n]|\\')*(?:(?<!\\)'(?:[^'\n]|\\')*(?:(?<!\\)')(?:[^'\n]|\\')*)*\b(foe)\b'
Start your engine!
Python's regex engine performs the following operations.
^ : assert beginning of string
(?: : begin non-capture group
[^'\n] : match any char other than single quote and line terminator
| : or
\\' : match '\' then a single quote
) : end non-capture group
* : execute non-capture group 0+ times
(?: : begin non-capture group
(?<!\\) : next char is not preceded by '\' (negative lookbehind)
' : match single quote
(?: : begin non-capture group
[^'\n] : match any char other than single quote and line terminator
| : or
\\' : match '\' then a single quote
) : end non-capture group
* : execute non-capture group 0+ times
(?: : begin non-capture group
(?<!\\) : next char is not preceded by '\' (negative lookbehind)
' : match single quote
) : end non-capture group
(?: : begin non-capture group
[^'\n] : match any char other than single quote and line terminator
| : or
\\' : match '\' then a single quote
) : end non-capture group
* : execute non-capture group 0+ times
) : end non-capture group
* : execute non-capture group 0+ times
\b(foe)\b : match 'foe' in capture group 1
来源:https://stackoverflow.com/questions/41137995/regular-expression-match-word-not-between-quotes