Suppose I have something like:
a = \"Gżegżółka\"
a = bytes(a, \'utf-8\')
a = str(a)
which returns string in form:
b\'G\\xc5\\xb
If you want to encode and decode text, that's what the encode and decode methods are for:
>>> a = "Gżegżółka"
>>> b = a.encode('utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = b.decode('utf-8')
>>> c
'Gżegżółka'
Also, notice that UTF-8 is already the default, so you can just do this:
>>> b = a.encode()
>>> c = b.decode()
The only reason you need to specify arguments is:
'surrogatereplace'
instead of 'strict'
, orHowever, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str
call, just as you did in the bytes
call:
>>> a = "Gżegżółka"
>>> b = bytes(a, 'utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = str(b, 'utf-8')
>>> c
Calling str
on a bytes
object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes
on a str
without an encoding, because the main job of str
is to give you a string representation of the object—and the best string representation of a bytes
object is that b'…'
.
I found it. The simplest way to convert string representation of bytes to bytes again is through the eval
statement:
a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a) #this is the input we deal with
a = eval(a) #that's how we transform a into bytes
a = str(a, 'utf-8') #...and now we convert it into string
print(a)