Python 3.7.4: 're.error: bad escape \s at position 0'

前端 未结 4 606
耶瑟儿~
耶瑟儿~ 2020-12-20 07:05

My program looks something like this:

import re
# Escape the string, in case it happens to have re metacharacters
my_str = \"The quick brown fox jumped\"
esc         


        
相关标签:
4条回答
  • 2020-12-20 07:14

    Try fiddling with the backslashes to avoid that regex tries to interpret \s:

    spaced_pattern = re.sub(r"\\\s+", "\\\s+", escaped_str)
    

    now

    >>> spaced_pattern
    'The\\s+quick\\s+brown\\s+fox\\s+jumped'
    >>> print(spaced_pattern)
    The\s+quick\s+brown\s+fox\s+jumped
    

    But why?

    It seems that python tries to interpret \s like it would interpret r"\n" instead of leaving it alone like Python normally does. If you do. For example:

    re.sub(r"\\\s+", r"\n+", escaped_str)
    

    yields:

    The
    +quick
    +brown
    +fox
    +jumped
    

    even if \n was used in a raw string.

    The change was introduced in Issue #27030: Unknown escapes consisting of '\' and ASCII letter in regular expressions now are errors.

    The code that does the replacement is in sre_parse.py (python 3.7):

            else:
                try:
                    this = chr(ESCAPES[this][1])
                except KeyError:
                    if c in ASCIILETTERS:
                        raise s.error('bad escape %s' % this, len(this))
    

    This code looks for what's behind a literal \ and tries to replace it by the proper non-ascii character. Obviously s is not in ESCAPES dictionary so the KeyError exception is triggered, then the message you're getting.

    On previous versions it just issued a warning:

    import warnings
    warnings.warn('bad escape %s' % this,
                  DeprecationWarning, stacklevel=4)
    

    Looks that we're not alone to suffer from 3.6 to 3.7 upgrade: https://github.com/gi0baro/weppy/issues/227

    0 讨论(0)
  • 2020-12-20 07:20

    Just try import regex as re instead of import re.

    0 讨论(0)
  • 2020-12-20 07:20

    I guess you might be trying to do:

    import re
    # Escape the string, in case it happens to have re metacharacters
    my_str = "The\\ quick\\ brown\\ fox\\ jumped"
    escaped_str = re.escape(my_str)
    # "The\\ quick\\ brown\\ fox\\ jumped"
    # Replace escaped space patterns with a generic white space pattern
    print(re.sub(r"\\\\\\\s+", " ", escaped_str))
    

    Output 1

    The quick brown fox jumped
    

    If you might want to have literal \s+, then try this answer or maybe:

    import re
    # Escape the string, in case it happens to have re metacharacters
    my_str = "The\\ quick\\ brown\\ fox\\ jumped"
    escaped_str = re.escape(my_str)
    print(re.sub(r"\\\\\\\s+", re.escape(r"\s") + '+', escaped_str))
    

    Output 2

    The\s+quick\s+brown\s+fox\s+jumped
    

    Or maybe:

    import re
    # Escape the string, in case it happens to have re metacharacters
    my_str = "The\\ quick\\ brown\\ fox\\ jumped"
    print(re.sub(r"\s+", "s+", my_str))
    

    Output 3

    The\s+quick\s+brown\s+fox\s+jumped
    

    If you wish to simplify/modify/explore the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


    RegEx Circuit

    jex.im visualizes regular expressions:

    Demo

    0 讨论(0)
  • 2020-12-20 07:26

    Regex engines behave the same way (mostly) when it comes to replacement strings
    that are handed to them.
    They try to insert the control code equivalent of escaped characters, like tabs crlf's, etc ...
    Any escape sequence it doesn't recognize, it just strips off the escape.

    Given
    spaced_pattern = re.sub(r"\\\s+", r"\s+", escaped_str)

    the r"\s+" hands the engine this replacement string \s+.
    Since there is no such escape sequence, it just strips off the escape
    and inserts s+ into the replace position.

    You can see it here https://regex101.com/r/42QCvi/1
    There is no error thrown, but it should be since your not getting what you think you should.

    In reality, a literal escape should always be escaped
    as can be seen here https://regex101.com/r/bzQgfN/1

    Nothing new, they just say its an error, but its really a notification warning
    that you're not getting what you think.
    Been this way for years and years. Sometimes its an error, sometimes not.

    0 讨论(0)
提交回复
热议问题