Python, Unicode, and the Windows console

前端 未结 13 2103
佛祖请我去吃肉
佛祖请我去吃肉 2020-11-21 04:38

When I try to print a Unicode string in a Windows console, I get a UnicodeEncodeError: \'charmap\' codec can\'t encode character .... error. I assume this is b

相关标签:
13条回答
  • 2020-11-21 05:02

    James Sulak asked,

    Is there any way I can make Python automatically print a ? instead of failing in this situation?

    Other solutions recommend we attempt to modify the Windows environment or replace Python's print() function. The answer below comes closer to fulfilling Sulak's request.

    Under Windows 7, Python 3.5 can be made to print Unicode without throwing a UnicodeEncodeError as follows:

        In place of:    print(text)
        substitute:     print(str(text).encode('utf-8'))

    Instead of throwing an exception, Python now displays unprintable Unicode characters as \xNN hex codes, e.g.:

      Halmalo n\xe2\x80\x99\xc3\xa9tait plus qu\xe2\x80\x99un point noir

    Instead of

      Halmalo n’était plus qu’un point noir

    Granted, the latter is preferable ceteris paribus, but otherwise the former is completely accurate for diagnostic messages. Because it displays Unicode as literal byte values the former may also assist in diagnosing encode/decode problems.

    Note: The str() call above is needed because otherwise encode() causes Python to reject a Unicode character as a tuple of numbers.

    0 讨论(0)
  • 2020-11-21 05:03

    The cause of your problem is NOT the Win console not willing to accept Unicode (as it does this since I guess Win2k by default). It is the default system encoding. Try this code and see what it gives you:

    import sys
    sys.getdefaultencoding()
    

    if it says ascii, there's your cause ;-) You have to create a file called sitecustomize.py and put it under python path (I put it under /usr/lib/python2.5/site-packages, but that is differen on Win - it is c:\python\lib\site-packages or something), with the following contents:

    import sys
    sys.setdefaultencoding('utf-8')
    

    and perhaps you might want to specify the encoding in your files as well:

    # -*- coding: UTF-8 -*-
    import sys,time
    

    Edit: more info can be found in excellent the Dive into Python book

    0 讨论(0)
  • 2020-11-21 05:04

    Note: This answer is sort of outdated (from 2008). Please use the solution below with care!!


    Here is a page that details the problem and a solution (search the page for the text Wrapping sys.stdout into an instance):

    PrintFails - Python Wiki

    Here's a code excerpt from that page:

    $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
        sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
        line = u"\u0411\n"; print type(line), len(line); \
        sys.stdout.write(line); print line'
      UTF-8
      <type 'unicode'> 2
      Б
      Б
    
      $ python -c 'import sys, codecs, locale; print sys.stdout.encoding; \
        sys.stdout = codecs.getwriter(locale.getpreferredencoding())(sys.stdout); \
        line = u"\u0411\n"; print type(line), len(line); \
        sys.stdout.write(line); print line' | cat
      None
      <type 'unicode'> 2
      Б
      Б
    

    There's some more information on that page, well worth a read.

    0 讨论(0)
  • 2020-11-21 05:05

    TL;DR:

    print(yourstring.encode('ascii','replace'));
    

    I ran into this myself, working on a Twitch chat (IRC) bot. (Python 2.7 latest)

    I wanted to parse chat messages in order to respond...

    msg = s.recv(1024).decode("utf-8")
    

    but also print them safely to the console in a human-readable format:

    print(msg.encode('ascii','replace'));
    

    This corrected the issue of the bot throwing UnicodeEncodeError: 'charmap' errors and replaced the unicode characters with ?.

    0 讨论(0)
  • 2020-11-21 05:08

    The below code will make Python output to console as UTF-8 even on Windows.

    The console will display the characters well on Windows 7 but on Windows XP it will not display them well, but at least it will work and most important you will have a consistent output from your script on all platforms. You'll be able to redirect the output to a file.

    Below code was tested with Python 2.6 on Windows.

    
    #!/usr/bin/python
    # -*- coding: UTF-8 -*-
    
    import codecs, sys
    
    reload(sys)
    sys.setdefaultencoding('utf-8')
    
    print sys.getdefaultencoding()
    
    if sys.platform == 'win32':
        try:
            import win32console 
        except:
            print "Python Win32 Extensions module is required.\n You can download it from https://sourceforge.net/projects/pywin32/ (x86 and x64 builds are available)\n"
            exit(-1)
        # win32console implementation  of SetConsoleCP does not return a value
        # CP_UTF8 = 65001
        win32console.SetConsoleCP(65001)
        if (win32console.GetConsoleCP() != 65001):
            raise Exception ("Cannot set console codepage to 65001 (UTF-8)")
        win32console.SetConsoleOutputCP(65001)
        if (win32console.GetConsoleOutputCP() != 65001):
            raise Exception ("Cannot set console output codepage to 65001 (UTF-8)")
    
    #import sys, codecs
    sys.stdout = codecs.getwriter('utf8')(sys.stdout)
    sys.stderr = codecs.getwriter('utf8')(sys.stderr)
    
    print "This is an Е乂αmp١ȅ testing Unicode support using Arabic, Latin, Cyrillic, Greek, Hebrew and CJK code points.\n"
    
    0 讨论(0)
  • 2020-11-21 05:09

    Just enter this code in command line before executing python script:

    chcp 65001 & set PYTHONIOENCODING=utf-8
    
    0 讨论(0)
提交回复
热议问题