I\'ve never fully wrapped my head around encoding and decoding unicode to other formats (utf-8, utf-16, ascii, etc.) but I\'ve reached a wall thatis both confusing and frust
Do not encode to utf-8; print Unicode directly instead:
print(u'♠')
See how to print Unicode to Windows console.
What I'm trying to do is print utf-8 card symbols (♠,♥,♦,♣) from a python module to a windows console
UTF-8 is a byte encoding of Unicode characters. ♠♥♦♣ are Unicode characters which can be reproduced in a variety of encodings and UTF-8 is one of those encodings—as a UTF, UTF-8 can reproduce any Unicode character. But there is nothing specifically “UTF-8” about those characters.
Other encodings that can reproduce the characters ♠♥♦♣ are Windows code page 850 and 437, which your console is likely to be using under a Western European install of Windows. You can print ♠ in these encodings but you are not using UTF-8 to do so, and you won't be able to use other Unicode characters that are available in UTF-8 but outside the scope of these code pages.
print(u'♠')
UnicodeEncodeError: 'charmap' codec can't encode character '\u2660'
In Python 3 this is the same as the print('♠')
test you did above, so there is something different about how you are invoking the script containing this print
, compared to your py -3.4
. What does sys.stdout.encoding
give you from the script?
To get print
working correctly you would have to make sure Python picks up the right encoding. If it is not doing that adequately from the terminal settings you would indeed have to set PYTHONIOENCODING
to cp437
.
>>> text = '♠'
>>> print(text.encode('utf-8'))
b'\xe2\x99\xa0'
print
can only print Unicode strings. For other types including the bytes
string that results from the encode()
method, it gets the literal representation (repr
) of the object. b'\xe2\x99\xa0'
is how you would write a Python 3 bytes literal containing a UTF-8 encoded ♠.
If what you want to do is bypass print
's implicit encoding to PYTHONIOENCODING and substitute your own, you can do that explicitly:
>>> import sys
>>> sys.stdout.buffer.write('♠'.encode('cp437'))
This will of course generate wrong output for any consoles not running code page 437 (eg non-Western-European installs). Generally, for apps using the C stdio, like Python does, getting non-ASCII characters to the Windows console is just too unreliable to bother with.
Since Python 3.7.x, You can reconfigure stdout :
import sys
sys.stdout.reconfigure(encoding='utf-8')
You can look at it this way. A string is a sequence of characters, not a sequence of bytes. Characters are Unicode codepoints. Bytes are just numbers in range 0–255. At the low level, computers work just with sequences of bytes. If you want to a print a string, you just call print(a_string)
in Python. But to communicate with the OS environment, the string has to be encoded to a sequence of bytes. This is done automatically somewhere under the hoods of print
function. The encoding used is sys.stdout.encoding
. If you get an UnicodeEncodeError
, it means that your characters cannot be encoded using the current encoding.
As far as I know, it is currently not possible to run Python on Windows in a way that that the encoding used is capable of encoding every character (as UTF-8 or UTF-16) and both assumed by Python and really used by the OS environment for both input and output. There is a workaround – you can use win_unicode_console
package, which aims to solve this issue. Just install it by pip install win_unicode_console
, and in your sitecustomize
import it and call win_unicode_console.enable()
. This will serve as an external patch to your Python installation ragarding this issue. See the documentation for more information: https://github.com/Drekin/win-unicode-console.
By default, the console in Microsoft Windows only displays 256 characters (cp437, of "Code page 437", the original IBM-PC 1981 extended ASCII character set) as you say in comments.
and in other side the PYTHONIOENCODING
is set to UTF-8
by default. so i think when you want to print unicode in windows you have to align sys.stdout.encoding
and PYTHONIOENCODING
with together !
also note that when you specify an encoding for your .py
file it just use it for that code and dont change the default system encoding
.
so do something like this :
import codecs
my_str='♠' # or something like my_str='\u05dd'
my_str.encode().decode('cp437')