When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:
# -*- co
export PYTHONIOENCODING=utf-8
do the job, but can't set it on python itself ...
what we can do is verify if isn't setting and tell the user to set it before call script with :
if __name__ == '__main__':
if (sys.stdout.encoding is None):
print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
exit(1)
Update to reply to the comment: the problem just exist when piping to stdout . I tested in Fedora 25 Python 2.7.13
python --version
Python 2.7.13
cat b.py
#!/usr/bin/env python
#-*- coding: utf-8 -*-
import sys
print sys.stdout.encoding
running ./b.py
UTF-8
running ./b.py | less
None
First, regarding this solution:
# -*- coding: utf-8 -*-
print u"åäö".encode('utf-8')
It's not practical to explicitly print with a given encoding every time. That would be repetitive and error-prone.
A better solution is to change sys.stdout
at the start of your program, to encode with a selected encoding. Here is one solution I found on Python: How is sys.stdout.encoding chosen?, in particular a comment by "toka":
import sys
import codecs
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8". I have written a page on my ordeal with this problem.
Tl;dr of the blog post:
import sys, locale, os
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())
print(sys.getfilesystemencoding())
print(os.environ["PYTHONIOENCODING"])
print(chr(246), chr(9786), chr(9787))
gives you
utf_8
False
ANSI_X3.4-1968
ascii
utf_8
ö ☺ ☻
An arguable sanitized version of Craig McQueen's answer.
import sys, codecs
class EncodedOut:
def __init__(self, enc):
self.enc = enc
self.stdout = sys.stdout
def __enter__(self):
if sys.stdout.encoding is None:
w = codecs.getwriter(self.enc)
sys.stdout = w(sys.stdout)
def __exit__(self, exc_ty, exc_val, tb):
sys.stdout = self.stdout
Usage:
with EncodedOut('utf-8'):
print u'ÅÄÖåäö'
I ran into this problem in a legacy application, and it was difficult to identify where what was printed. I helped myself with this hack:
# encoding_utf8.py
import codecs
import builtins
def print_utf8(text, **kwargs):
print(str(text).encode('utf-8'), **kwargs)
def print_utf8(fn):
def print_fn(*args, **kwargs):
return fn(str(*args).encode('utf-8'), **kwargs)
return print_fn
builtins.print = print_utf8(print)
On top of my script, test.py:
import encoding_utf8
string = 'Axwell Λ Ingrosso'
print(string)
Note that this changes ALL calls to print to use an encoding, so your console will print this:
$ python test.py
b'Axwell \xce\x9b Ingrosso'
I could "automate" it with a call to:
def __fix_io_encoding(last_resort_default='UTF-8'):
import sys
if [x for x in (sys.stdin,sys.stdout,sys.stderr) if x.encoding is None] :
import os
defEnc = None
if defEnc is None :
try:
import locale
defEnc = locale.getpreferredencoding()
except: pass
if defEnc is None :
try: defEnc = sys.getfilesystemencoding()
except: pass
if defEnc is None :
try: defEnc = sys.stdin.encoding
except: pass
if defEnc is None :
defEnc = last_resort_default
os.environ['PYTHONIOENCODING'] = os.environ.get("PYTHONIOENCODING",defEnc)
os.execvpe(sys.argv[0],sys.argv,os.environ)
__fix_io_encoding() ; del __fix_io_encoding
Yes, it's possible to get an infinite loop here if this "setenv" fails.