问题
I have a file containing UTF-16 strings. When I try to read the unicode, " " (double quotes) are added and the string looks like "b'\\xff\\xfeA\\x00'"
. The inbuilt .decode
function throws a AttributeError: 'str' object has no attribute 'decode'
. I tried a few options but those didn't work.
This is what the file I am reading from looks like
回答1:
Try this:
str.encode().decode()
回答2:
It looks like the file has been created by writing bytes literals to it, something like this:
some_bytes = b'Hello world'
with open('myfile.txt', 'w') as f:
f.write(str(some_bytes))
This gets around the fact that attempting write bytes to a file opened in text mode raises an error, but at the cost that the file now contains "b'hello world'"
(note the 'b' inside the quotes).
The solution is to decode the bytes
to str
before writing:
some_bytes = b'Hello world'
my_str = some_bytes.decode('utf-16') # or whatever the encoding of the bytes might be
with open('myfile.txt', 'w') as f:
f.write(my_str)
or open the file in binary mode and write the bytes directly
some_bytes = b'Hello world'
with open('myfile.txt', 'wb') as f:
f.write(some_bytes)
Note you will need to provide the correct encoding if opening the file in text mode
with open('myfile.txt', encoding='utf-16') as f: # Be sure to use the correct encoding
Consider running Python with the -b
or -bb
flag set to raise a warning or exception respectively to detect attempts to stringify bytes.
来源:https://stackoverflow.com/questions/65168223/how-to-decode-unicode-string-that-is-read-from-a-file-in-python