Python, Unicode, and the Windows console

前端 未结 13 2150
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-21 04:38

When I try to print a Unicode string in a Windows console, I get a UnicodeEncodeError: \'charmap\' codec can\'t encode character .... error. I assume this is b

13条回答
  •  天涯浪人
    2020-11-21 05:18

    Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.


    I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.

    The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:

    >>> u"\N{EURO SIGN}".encode('cp437')
    Traceback (most recent call last):
    ...
    UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
    character maps to 

    I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?

    Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in @Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:

    T:\> py -mpip install win-unicode-console
    T:\> py -mrun your_script.py
    

    See What's the deal with Python 3.4, Unicode, different languages and Windows?

    Is there any way I can make Python automatically print a ? instead of failing in this situation?

    If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:

    T:\> set PYTHONIOENCODING=:replace
    T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
    [?]
    

    In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.

提交回复
热议问题