I am looking for a Python regex for a variable phrase with the following properties:
(For the sake of example, let\'s assume the variable phrase here is taking the value
You may use
r'(?<![^\W_])and(?![^\W_])'
See the regex demo. Compile with the re.I
flag to enable case insensitive matching.
Details
(?<![^\W_])
- the preceding char should not be a letter or digit charand
- some keyword(?![^\W_])
- the next char cannot be a letter or digitPython demo:
import re
strs = ['this_and', 'this.and', '(and)', '[and]', 'and^', ';And', 'land', 'andy']
phrase = "and"
rx = re.compile(r'(?<![^\W_]){}(?![^\W_])'.format(re.escape(phrase)), re.I)
for s in strs:
print("{}: {}".format(s, bool(rx.search(s))))
Output:
this_and: True
this.and: True
(and): True
[and]: True
and^: True
;And: True
land: False
andy: False
Here is a regex that might solve it:
Regex
(?<=[\W_]+|^)and(?=[\W_]+|$)
Example
# import regex
string = 'this_And'
test = regex.search(r'(?<=[\W_]+|^)and(?=[\W_]+|$)', string.lower())
print(test.group(0))
# prints 'and'
# No match
string = 'Andy'
test = regex.search(r'(?<=[\W_]+|^)and(?=[\W_]+|$)', string.lower())
print(test)
# prints None
strings = [ "this_and", "this.and", "(and)", "[and]", "and^", ";And"]
[regex.search(r'(?<=[\W_]+|^)and(?=[\W_]+|$)', s.lower()).group(0) for s in strings if regex.search(r'(?<=[\W_]+|^)and(?=[\W_]+|$)', s.lower())]
# prints ['and', 'and', 'and', 'and', 'and', 'and']
Explanation
[\W_]+
means we accept before (?<=
) or after (?=
) and
only non-word symbols except the underscore _
(a word symbol that) is accepted. |^
and |$
allow matches to lie at the edge of the string.
Edit
As mentioned in my comment, the module regex
does not yield errors with variable lookbehind lengths (as opposed to re
).
# This works fine
# import regex
word = 'and'
pattern = r'(?<=[\W_]+|^){}(?=[\W_]+|$)'.format(word.lower())
string = 'this_And'
regex.search(pattern, string.lower())
However, if you insist on using re
, then of the top of my head I'd suggest splitting the lookbehind in two (?<=[\W_])and(?=[\W_]+|$)|^and(?=[\W_]+|$)
that way cases where the string starts with and
are captured as well.
# This also works fine
# import re
word = 'and'
pattern = r'(?<=[\W_]){}(?=[\W_]+|$)|^{}(?=[\W_]+|$)'.format(word.lower(), word.lower())
string = 'this_And'
re.search(pattern, string.lower())