Setting the correct encoding when piping stdout in Python

后端 未结 10 2424
迷失自我
迷失自我 2020-11-22 01:21

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- co         


        
相关标签:
10条回答
  • 2020-11-22 02:10

    On Windows, I had this problem very often when running a Python code from an editor (like Sublime Text), but not if running it from command-line.

    In this case, check your editor's parameters. In the case of SublimeText, this Python.sublime-build solved it:

    {
      "cmd": ["python", "-u", "$file"],
      "file_regex": "^[ ]*File \"(...*?)\", line ([0-9]*)",
      "selector": "source.python",
      "encoding": "utf8",
      "env": {"PYTHONIOENCODING": "utf-8", "LANG": "en_US.UTF-8"}
    }
    
    0 讨论(0)
  • 2020-11-22 02:14

    I had a similar issue last week. It was easy to fix in my IDE (PyCharm).

    Here was my fix:

    Starting from PyCharm menu bar: File -> Settings... -> Editor -> File Encodings, then set: "IDE Encoding", "Project Encoding" and "Default encoding for properties files" ALL to UTF-8 and she now works like a charm.

    Hope this helps!

    0 讨论(0)
  • 2020-11-22 02:14

    I just thought I'd mention something here which I had to spent a long time experimenting with before I finally realised what was going on. This may be so obvious to everyone here that they haven't bothered mentioning it. But it would've helped me if they had, so on that principle...!

    NB: I am using Jython specifically, v 2.7, so just possibly this may not apply to CPython...

    NB2: the first two lines of my .py file here are:

    # -*- coding: utf-8 -*-
    from __future__ import print_function
    

    The "%" (AKA "interpolation operator") string construction mechanism causes ADDITIONAL problems too... If the default encoding of the "environment" is ASCII and you try to do something like

    print( "bonjour, %s" % "fréd" )  # Call this "print A"
    

    You will have no difficulty running in Eclipse... In a Windows CLI (DOS window) you will find that the encoding is code page 850 (my Windows 7 OS) or something similar, which can handle European accented characters at least, so it'll work.

    print( u"bonjour, %s" % "fréd" ) # Call this "print B"
    

    will also work.

    If, OTOH, you direct to a file from the CLI, the stdout encoding will be None, which will default to ASCII (on my OS anyway), which will not be able to handle either of the above prints... (dreaded encoding error).

    So then you might think of redirecting your stdout by using

    sys.stdout = codecs.getwriter('utf8')(sys.stdout)
    

    and try running in the CLI piping to a file... Very oddly, print A above will work... But print B above will throw the encoding error! The following will however work OK:

    print( u"bonjour, " + "fréd" ) # Call this "print C"
    

    The conclusion I have come to (provisionally) is that if a string which is specified to be a Unicode string using the "u" prefix is submitted to the %-handling mechanism it appears to involve the use of the default environment encoding, regardless of whether you have set stdout to redirect!

    How people deal with this is a matter of choice. I would welcome a Unicode expert to say why this happens, whether I've got it wrong in some way, what the preferred solution to this, whether it also applies to CPython, whether it happens in Python 3, etc., etc.

    0 讨论(0)
  • 2020-11-22 02:21

    Your code works when run in an script because Python encodes the output to whatever encoding your terminal application is using. If you are piping you must encode it yourself.

    A rule of thumb is: Always use Unicode internally. Decode what you receive, and encode what you send.

    # -*- coding: utf-8 -*-
    print u"åäö".encode('utf-8')
    

    Another didactic example is a Python program to convert between ISO-8859-1 and UTF-8, making everything uppercase in between.

    import sys
    for line in sys.stdin:
        # Decode what you receive:
        line = line.decode('iso8859-1')
    
        # Work with Unicode internally:
        line = line.upper()
    
        # Encode what you send:
        line = line.encode('utf-8')
        sys.stdout.write(line)
    

    Setting the system default encoding is a bad idea, because some modules and libraries you use can rely on the fact it is ASCII. Don't do it.

    0 讨论(0)
提交回复
热议问题