My program looks something like this:
import re
# Escape the string, in case it happens to have re metacharacters
my_str = \"The quick brown fox jumped\"
esc
Try fiddling with the backslashes to avoid that regex tries to interpret \s
:
spaced_pattern = re.sub(r"\\\s+", "\\\s+", escaped_str)
now
>>> spaced_pattern
'The\\s+quick\\s+brown\\s+fox\\s+jumped'
>>> print(spaced_pattern)
The\s+quick\s+brown\s+fox\s+jumped
It seems that python tries to interpret \s
like it would interpret r"\n"
instead of leaving it alone like Python normally does. If you do. For example:
re.sub(r"\\\s+", r"\n+", escaped_str)
yields:
The
+quick
+brown
+fox
+jumped
even if \n
was used in a raw string.
The change was introduced in Issue #27030: Unknown escapes consisting of '\' and ASCII letter in regular expressions now are errors.
The code that does the replacement is in sre_parse.py
(python 3.7):
else:
try:
this = chr(ESCAPES[this][1])
except KeyError:
if c in ASCIILETTERS:
raise s.error('bad escape %s' % this, len(this))
This code looks for what's behind a literal \
and tries to replace it by the proper non-ascii character. Obviously s
is not in ESCAPES
dictionary so the KeyError
exception is triggered, then the message you're getting.
On previous versions it just issued a warning:
import warnings
warnings.warn('bad escape %s' % this,
DeprecationWarning, stacklevel=4)
Looks that we're not alone to suffer from 3.6 to 3.7 upgrade: https://github.com/gi0baro/weppy/issues/227
Just try import regex as re
instead of import re
.
I guess you might be trying to do:
import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
escaped_str = re.escape(my_str)
# "The\\ quick\\ brown\\ fox\\ jumped"
# Replace escaped space patterns with a generic white space pattern
print(re.sub(r"\\\\\\\s+", " ", escaped_str))
The quick brown fox jumped
If you might want to have literal \s+, then try this answer or maybe:
import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
escaped_str = re.escape(my_str)
print(re.sub(r"\\\\\\\s+", re.escape(r"\s") + '+', escaped_str))
The\s+quick\s+brown\s+fox\s+jumped
Or maybe:
import re
# Escape the string, in case it happens to have re metacharacters
my_str = "The\\ quick\\ brown\\ fox\\ jumped"
print(re.sub(r"\s+", "s+", my_str))
The\s+quick\s+brown\s+fox\s+jumped
If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.
jex.im visualizes regular expressions:
Regex engines behave the same way (mostly) when it comes to replacement strings
that are handed to them.
They try to insert the control code equivalent of escaped characters, like tabs crlf's, etc ...
Any escape sequence it doesn't recognize, it just strips off the escape.
Given
spaced_pattern = re.sub(r"\\\s+", r"\s+", escaped_str)
the r"\s+"
hands the engine this replacement string \s+
.
Since there is no such escape sequence, it just strips off the escape
and inserts s+
into the replace position.
You can see it here https://regex101.com/r/42QCvi/1
There is no error thrown, but it should be since your not getting what you think you should.
In reality, a literal escape should always be escaped
as can be seen here https://regex101.com/r/bzQgfN/1
Nothing new, they just say its an error, but its really a notification warning
that you're not getting what you think.
Been this way for years and years. Sometimes its an error, sometimes not.