Process escape sequences in a string in Python

前端 未结 6 1263
隐瞒了意图╮
隐瞒了意图╮ 2020-11-22 03:56

Sometimes when I get input from a file or the user, I get a string with escape sequences in it. I would like to process the escape sequences in the same way that Python proc

6条回答
  •  太阳男子
    2020-11-22 04:51

    The actually correct and convenient answer for python 3:

    >>> import codecs
    >>> myString = "spam\\neggs"
    >>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
    spam
    eggs
    >>> myString = "naïve \\t test"
    >>> print(codecs.escape_decode(bytes(myString, "utf-8"))[0].decode("utf-8"))
    naïve    test
    

    Details regarding codecs.escape_decode:

    • codecs.escape_decode is a bytes-to-bytes decoder
    • codecs.escape_decode decodes ascii escape sequences, such as: b"\\n" -> b"\n", b"\\xce" -> b"\xce".
    • codecs.escape_decode does not care or need to know about the byte object's encoding, but the encoding of the escaped bytes should match the encoding of the rest of the object.

    Background:

    • @rspeer is correct: unicode_escape is the incorrect solution for python3. This is because unicode_escape decodes escaped bytes, then decodes bytes to unicode string, but receives no information regarding which codec to use for the second operation.
    • @Jerub is correct: avoid the AST or eval.
    • I first discovered codecs.escape_decode from this answer to "how do I .decode('string-escape') in Python3?". As that answer states, that function is currently not documented for python 3.

提交回复
热议问题