We're preparing to move to Python 3.4 and added unicode_literals. Our code relies extensively on piping to/from external utilities using subprocess module. The following code snippet works fine on Python 2.7 to pipe UTF-8 strings to a sub-process:
kw = {}
kw[u'stdin'] = subprocess.PIPE
kw[u'stdout'] = subprocess.PIPE
kw[u'stderr'] = subprocess.PIPE
kw[u'executable'] = u'/path/to/binary/utility'
args = [u'', u'-l', u'nl']
line = u'¡Basta Ya!'
popen = subprocess.Popen(args,**kw)
popen.stdin.write('%s\n' % line.encode(u'utf-8'))
...blah blah...
The following changes throw this error:
from __future__ import unicode_literals
kw = {}
kw[u'stdin'] = subprocess.PIPE
kw[u'stdout'] = subprocess.PIPE
kw[u'stderr'] = subprocess.PIPE
kw[u'executable'] = u'/path/to/binary/utility'
args = [u'', u'-l', u'nl']
line = u'¡Basta Ya!'
popen = subprocess.Popen(args,**kw)
popen.stdin.write('%s\n' % line.encode(u'utf-8'))
Traceback (most recent call last):
File "test.py", line 138, in <module>
exitcode = main()
File "test.py", line 57, in main
popen.stdin.write('%s\n' % line.encode('utf-8'))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
Any suggestions to pass UTF-8 through the pipe?
'%s\n'
is a unicode string when you use unicode_literals
:
>>> line = u'¡Basta Ya!'
>>> '%s\n' % line.encode(u'utf-8')
'\xc2\xa1Basta Ya!\n'
>>> u'%s\n' % line.encode(u'utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 0: ordinal not in range(128)
What happens is that your encoded line
value is being decoded to interpolate into the unicode '%s\n'
string.
You'll have to use a byte string instead; prefix the string with b
:
>>> from __future__ import unicode_literals
>>> line = u'¡Basta Ya!'
>>> b'%s\n' % line.encode(u'utf-8')
'\xc2\xa1Basta Ya!\n'
or encode after interpolation:
>>> line = u'¡Basta Ya!'
>>> ('%s\n' % line).encode(u'utf-8')
'\xc2\xa1Basta Ya!\n'
In Python 3, you'll have to write bytestrings to pipes anyway.
If utf-8
stands for your locale encoding then to communicate using Unicode strings, you could use universal_newlines=True
on Python 3:
#!/usr/bin/env python3
from subprocess import Popen, PIPE
p = Popen(['/path/to/binary/utility', '-l', 'nl'],
stdin=PIPE, stdout=PIPE, stderr=PIPE,
universal_newlines=True)
out, err = p.communicate('¡Basta Ya!')
The code works even if the locale's encoding is not utf-8. Input/output are Unicode strings here (str
type).
If the subprocess requires utf-8
whatever the current locale is then communicate using bytestrings instead (pass/read bytes):
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import os
from subprocess import Popen, PIPE
p = Popen(['/path/to/binary/utility', '-l', 'nl'],
stdin=PIPE, stdout=PIPE, stderr=PIPE)
out, err = map(lambda b: b.decode('utf-8').replace(os.linesep, '\n'),
p.communicate((u'¡Basta Ya!' + os.linesep).encode('utf-8')))
The code works the same on both Python 2 and 3.
来源:https://stackoverflow.com/questions/27722720/how-to-fix-an-encoding-migrating-python-subprocess-to-unicode-literals