I\'am trying to get running a very simple example on OSX with python 3.5.1 but I\'m really stucked. Have read so many articles that deal with similar problems but I can not
Try running your script with explicitly defined PYTHONIOENCODING
environment variable:
PYTHONIOENCODING=utf-8 python3 script.py
If you are facing this problem while reading/writing a file, then try this
import codecs
# File read
with codecs.open(filename, 'r', encoding='utf8') as f:
text = f.read()
# File write
with codecs.open(filename, 'w', encoding='utf8') as f:
f.write(text)
Your environment variables set wrong. Work's for me:
echo "LC_ALL=en_US.UTF-8" >> /etc/environment
echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
echo "LANG=en_US.UTF-8" > /etc/locale.conf
locale-gen en_US.UTF-8
Remove the characters <
and >
:
# coding=latin-1
Those character are often used in examples to indicate where the encoding name goes, but the literal characters <
and >
should not be included in your file.
For that to work, your file must be encoded using latin-1. If your file is actually encoded using utf-8, the encoding line should be
# coding=utf-8
For example, when I run this script (saved as a file with latin-1 encoding):
# coding=latin-1
mylist = [u'Glück', u'Spaß', u'Ähre',]
print(mylist)
for w in mylist:
print(w.encode("latin-1"))
I get this output (with no errors):
['Glück', 'Spaß', 'Ähre']
b'Gl\xfcck'
b'Spa\xdf'
b'\xc4hre'
That output looks correct. For example, the latin-1 encoding of ü is '\xfc'
.
I used my editor to save the file with latin-1 encoding. The contents of the file in hexadecimal are:
$ hexdump -C codec-question.py
00000000 23 20 63 6f 64 69 6e 67 3d 6c 61 74 69 6e 2d 31 |# coding=latin-1|
00000010 0a 0a 6d 79 6c 69 73 74 20 3d 20 5b 75 27 47 6c |..mylist = [u'Gl|
00000020 fc 63 6b 27 2c 20 75 27 53 70 61 df 27 2c 20 75 |.ck', u'Spa.', u|
00000030 27 c4 68 72 65 27 2c 5d 0a 70 72 69 6e 74 28 6d |'.hre',].print(m|
00000040 79 6c 69 73 74 29 0a 0a 66 6f 72 20 77 20 69 6e |ylist)..for w in|
00000050 20 6d 79 6c 69 73 74 3a 0a 20 20 20 20 70 72 69 | mylist:. pri|
00000060 6e 74 28 77 2e 65 6e 63 6f 64 65 28 22 6c 61 74 |nt(w.encode("lat|
00000070 69 6e 2d 31 22 29 29 0a |in-1")).|
00000078
Note that the first byte (represented in hexadecimal) in the third line (i.e. the character at position 0x20) is fc
. That is the latin-1 encoding of ü. If the file was encoded using utf-8, the character ü would be represented using two bytes, c3 bc
.