Python, Unicode, and the Windows console

前端 未结 13 2104
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-21 04:38

When I try to print a Unicode string in a Windows console, I get a UnicodeEncodeError: \'charmap\' codec can\'t encode character .... error. I assume this is b

相关标签:
13条回答
  • 2020-11-21 05:11

    If you're not interested in getting a reliable representation of the bad character(s) you might use something like this (working with python >= 2.6, including 3.x):

    from __future__ import print_function
    import sys
    
    def safeprint(s):
        try:
            print(s)
        except UnicodeEncodeError:
            if sys.version_info >= (3,):
                print(s.encode('utf8').decode(sys.stdout.encoding))
            else:
                print(s.encode('utf8'))
    
    safeprint(u"\N{EM DASH}")
    

    The bad character(s) in the string will be converted in a representation which is printable by the Windows console.

    0 讨论(0)
  • 2020-11-21 05:11

    Like Giampaolo Rodolà's answer, but even more dirty: I really, really intend to spend a long time (soon) understanding the whole subject of encodings and how they apply to Windoze consoles,

    For the moment I just wanted sthg which would mean my program would NOT CRASH, and which I understood ... and also which didn't involve importing too many exotic modules (in particular I'm using Jython, so half the time a Python module turns out not in fact to be available).

    def pr(s):
        try:
            print(s)
        except UnicodeEncodeError:
            for c in s:
                try:
                    print( c, end='')
                except UnicodeEncodeError:
                    print( '?', end='')
    

    NB "pr" is shorter to type than "print" (and quite a bit shorter to type than "safeprint")...!

    0 讨论(0)
  • 2020-11-21 05:16

    Python 3.6 windows7: There is several way to launch a python you could use the python console (which has a python logo on it) or the windows console (it's written cmd.exe on it).

    I could not print utf8 characters in the windows console. Printing utf-8 characters throw me this error:

    OSError: [winError 87] The paraneter is incorrect 
    Exception ignored in: (_io-TextIOwrapper name='(stdout)' mode='w' ' encoding='utf8') 
    OSError: [WinError 87] The parameter is incorrect 
    

    After trying and failing to understand the answer above I discovered it was only a setting problem. Right click on the top of the cmd console windows, on the tab font chose lucida console.

    0 讨论(0)
  • 2020-11-21 05:17

    For Python 2 try:

    print unicode(string, 'unicode-escape')
    

    For Python 3 try:

    import os
    string = "002 Could've Would've Should've"
    os.system('echo ' + string)
    

    Or try win-unicode-console:

    pip install win-unicode-console
    py -mrun your_script.py
    
    0 讨论(0)
  • 2020-11-21 05:18

    Update: Python 3.6 implements PEP 528: Change Windows console encoding to UTF-8: the default console on Windows will now accept all Unicode characters. Internally, it uses the same Unicode API as the win-unicode-console package mentioned below. print(unicode_string) should just work now.


    I get a UnicodeEncodeError: 'charmap' codec can't encode character... error.

    The error means that Unicode characters that you are trying to print can't be represented using the current (chcp) console character encoding. The codepage is often 8-bit encoding such as cp437 that can represent only ~0x100 characters from ~1M Unicode characters:

    >>> u"\N{EURO SIGN}".encode('cp437')
    Traceback (most recent call last):
    ...
    UnicodeEncodeError: 'charmap' codec can't encode character '\u20ac' in position 0:
    character maps to 

    I assume this is because the Windows console does not accept Unicode-only characters. What's the best way around this?

    Windows console does accept Unicode characters and it can even display them (BMP only) if the corresponding font is configured. WriteConsoleW() API should be used as suggested in @Daira Hopwood's answer. It can be called transparently i.e., you don't need to and should not modify your scripts if you use win-unicode-console package:

    T:\> py -mpip install win-unicode-console
    T:\> py -mrun your_script.py
    

    See What's the deal with Python 3.4, Unicode, different languages and Windows?

    Is there any way I can make Python automatically print a ? instead of failing in this situation?

    If it is enough to replace all unencodable characters with ? in your case then you could set PYTHONIOENCODING envvar:

    T:\> set PYTHONIOENCODING=:replace
    T:\> python3 -c "print(u'[\N{EURO SIGN}]')"
    [?]
    

    In Python 3.6+, the encoding specified by PYTHONIOENCODING envvar is ignored for interactive console buffers unless PYTHONLEGACYWINDOWSIOENCODING envvar is set to a non-empty string.

    0 讨论(0)
  • 2020-11-21 05:22

    Kind of related on the answer by J. F. Sebastian, but more direct.

    If you are having this problem when printing to the console/terminal, then do this:

    >set PYTHONIOENCODING=UTF-8
    
    0 讨论(0)
提交回复
热议问题