Python3: Decode UTF-8 bytes converted as string

后端 未结 2 755
迷失自我
迷失自我 2021-01-27 06:01

Suppose I have something like:

a = \"Gżegżółka\"
a = bytes(a, \'utf-8\')
a = str(a)

which returns string in form:

b\'G\\xc5\\xb         


        
相关标签:
2条回答
  • 2021-01-27 06:39

    If you want to encode and decode text, that's what the encode and decode methods are for:

    >>> a = "Gżegżółka"
    >>> b = a.encode('utf-8')
    >>> b
    b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
    >>> c = b.decode('utf-8')
    >>> c
    'Gżegżółka'
    

    Also, notice that UTF-8 is already the default, so you can just do this:

    >>> b = a.encode()
    >>> c = b.decode()
    

    The only reason you need to specify arguments is:

    • You need to use some other encoding instead of UTF-8,
    • You need to specify a specific error handler, like 'surrogatereplace' instead of 'strict', or
    • Your code has to run in Python 3.0-3.1 (which almost nobody used).

    However, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str call, just as you did in the bytes call:

    >>> a = "Gżegżółka"
    >>> b = bytes(a, 'utf-8')
    >>> b
    b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
    >>> c = str(b, 'utf-8')
    >>> c
    

    Calling str on a bytes object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes on a str without an encoding, because the main job of str is to give you a string representation of the object—and the best string representation of a bytes object is that b'…'.

    0 讨论(0)
  • 2021-01-27 06:41

    I found it. The simplest way to convert string representation of bytes to bytes again is through the eval statement:

    a = "Gżegżółka"
    a = bytes(a, 'utf-8')
    a = str(a) #this is the input we deal with
    
    a = eval(a) #that's how we transform a into bytes
    a = str(a, 'utf-8') #...and now we convert it into string
    
    print(a)
    
    0 讨论(0)
提交回复
热议问题