I have a python program running very well. It connects to several websites and outputs the desired information. Since not all websites are encoded with utf-8, I am requestin
When you run the python script in your terminal, your terminal is likely to be encoded in UTF8 (specially if you are using linux or mac).
When you set l
variable to "some string with latin characters"
, that string will be encoded to the default encoding, if you are using a terminal l
will be UTF8 and the script wont crash.
A little tip: if you have a string encoded in latin1 and you want it in unicode you can do:
variable.decode('latin1')
From the PrintFails wiki:
When Python finds its output attached to a terminal, it sets the
sys.stdout.encoding
attribute to the terminal's encoding. The print statement's handler will automatically encode unicode arguments into str output.
This is why your program works when called from the terminal.
When Python does not detect the desired character set of the output, it sets sys.stdout.encoding to None, and print will invoke the "ascii" codec.
This is why your program fails when called from php.
To make it work when called from php, you need to make explicit what encoding print
should use. For example, to make explicit that you want the output encoded in utf-8
(when not attached to a terminal):
ENCODING = sys.stdout.encoding if sys.stdout.encoding else 'utf-8'
print unicode("<div class='line'>%s</div>" % l, encoding).encode(ENCODING)
Alternatively, you could set the PYTHONIOENCODING environment variable. Then your code should work without changes (both from the terminal and when called from php).