I\'m using this code to get standard output from an external program:
>>> from subprocess import *
>>> command_stdout = Popen([\'ls\', \'-l
I think this way is easy:
>>> bytes_data = [112, 52, 52]
>>> "".join(map(chr, bytes_data))
'p44'
You need to decode the bytes object to produce a string:
>>> b"abcde"
b'abcde'
# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8")
'abcde'
I think you actually want this:
>>> from subprocess import *
>>> command_stdout = Popen(['ls', '-l'], stdout=PIPE).communicate()[0]
>>> command_text = command_stdout.decode(encoding='windows-1252')
Aaron's answer was correct, except that you need to know which encoding to use. And I believe that Windows uses 'windows-1252'. It will only matter if you have some unusual (non-ASCII) characters in your content, but then it will make a difference.
By the way, the fact that it does matter is the reason that Python moved to using two different types for binary and text data: it can't convert magically between them, because it doesn't know the encoding unless you tell it! The only way YOU would know is to read the Windows documentation (or read it here).
You need to decode the byte string and turn it in to a character (Unicode) string.
On Python 2
encoding = 'utf-8'
'hello'.decode(encoding)
or
unicode('hello', encoding)
On Python 3
encoding = 'utf-8'
b'hello'.decode(encoding)
or
str(b'hello', encoding)
In Python 3, the default encoding is "utf-8"
, so you can directly use:
b'hello'.decode()
which is equivalent to
b'hello'.decode(encoding="utf-8")
On the other hand, in Python 2, encoding defaults to the default string encoding. Thus, you should use:
b'hello'.decode(encoding)
where encoding
is the encoding you want.
Note: support for keyword arguments was added in Python 2.7.
When working with data from Windows systems (with \r\n
line endings), my answer is
String = Bytes.decode("utf-8").replace("\r\n", "\n")
Why? Try this with a multiline Input.txt:
Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8")
open("Output.txt", "w").write(String)
All your line endings will be doubled (to \r\r\n
), leading to extra empty lines. Python's text-read functions usually normalize line endings so that strings use only \n
. If you receive binary data from a Windows system, Python does not have a chance to do that. Thus,
Bytes = open("Input.txt", "rb").read()
String = Bytes.decode("utf-8").replace("\r\n", "\n")
open("Output.txt", "w").write(String)
will replicate your original file.