Setting the correct encoding when piping stdout in Python

后端 未结 10 2423
迷失自我
迷失自我 2020-11-22 01:21

When piping the output of a Python program, the Python interpreter gets confused about encoding and sets it to None. This means a program like this:

# -*- co         


        
相关标签:
10条回答
  • 2020-11-22 01:56
    export PYTHONIOENCODING=utf-8
    

    do the job, but can't set it on python itself ...

    what we can do is verify if isn't setting and tell the user to set it before call script with :

    if __name__ == '__main__':
        if (sys.stdout.encoding is None):
            print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout."
            exit(1)
    

    Update to reply to the comment: the problem just exist when piping to stdout . I tested in Fedora 25 Python 2.7.13

    python --version
    Python 2.7.13
    

    cat b.py

    #!/usr/bin/env python
    #-*- coding: utf-8 -*-
    import sys
    
    print sys.stdout.encoding
    

    running ./b.py

    UTF-8
    

    running ./b.py | less

    None
    
    0 讨论(0)
  • 2020-11-22 01:59

    First, regarding this solution:

    # -*- coding: utf-8 -*-
    print u"åäö".encode('utf-8')
    

    It's not practical to explicitly print with a given encoding every time. That would be repetitive and error-prone.

    A better solution is to change sys.stdout at the start of your program, to encode with a selected encoding. Here is one solution I found on Python: How is sys.stdout.encoding chosen?, in particular a comment by "toka":

    import sys
    import codecs
    sys.stdout = codecs.getwriter('utf8')(sys.stdout)
    
    0 讨论(0)
  • 2020-11-22 02:03

    You may want to try changing the environment variable "PYTHONIOENCODING" to "utf_8". I have written a page on my ordeal with this problem.

    Tl;dr of the blog post:

    import sys, locale, os
    print(sys.stdout.encoding)
    print(sys.stdout.isatty())
    print(locale.getpreferredencoding())
    print(sys.getfilesystemencoding())
    print(os.environ["PYTHONIOENCODING"])
    print(chr(246), chr(9786), chr(9787))
    

    gives you

    utf_8
    False
    ANSI_X3.4-1968
    ascii
    utf_8
    ö ☺ ☻
    
    0 讨论(0)
  • 2020-11-22 02:05

    An arguable sanitized version of Craig McQueen's answer.

    import sys, codecs
    class EncodedOut:
        def __init__(self, enc):
            self.enc = enc
            self.stdout = sys.stdout
        def __enter__(self):
            if sys.stdout.encoding is None:
                w = codecs.getwriter(self.enc)
                sys.stdout = w(sys.stdout)
        def __exit__(self, exc_ty, exc_val, tb):
            sys.stdout = self.stdout
    

    Usage:

    with EncodedOut('utf-8'):
        print u'ÅÄÖåäö'
    
    0 讨论(0)
  • 2020-11-22 02:06

    I ran into this problem in a legacy application, and it was difficult to identify where what was printed. I helped myself with this hack:

    # encoding_utf8.py
    import codecs
    import builtins
    
    
    def print_utf8(text, **kwargs):
        print(str(text).encode('utf-8'), **kwargs)
    
    
    def print_utf8(fn):
        def print_fn(*args, **kwargs):
            return fn(str(*args).encode('utf-8'), **kwargs)
        return print_fn
    
    
    builtins.print = print_utf8(print)
    

    On top of my script, test.py:

    import encoding_utf8
    string = 'Axwell Λ Ingrosso'
    print(string)
    

    Note that this changes ALL calls to print to use an encoding, so your console will print this:

    $ python test.py
    b'Axwell \xce\x9b Ingrosso'
    
    0 讨论(0)
  • 2020-11-22 02:10

    I could "automate" it with a call to:

    def __fix_io_encoding(last_resort_default='UTF-8'):
      import sys
      if [x for x in (sys.stdin,sys.stdout,sys.stderr) if x.encoding is None] :
          import os
          defEnc = None
          if defEnc is None :
            try:
              import locale
              defEnc = locale.getpreferredencoding()
            except: pass
          if defEnc is None :
            try: defEnc = sys.getfilesystemencoding()
            except: pass
          if defEnc is None :
            try: defEnc = sys.stdin.encoding
            except: pass
          if defEnc is None :
            defEnc = last_resort_default
          os.environ['PYTHONIOENCODING'] = os.environ.get("PYTHONIOENCODING",defEnc)
          os.execvpe(sys.argv[0],sys.argv,os.environ)
    __fix_io_encoding() ; del __fix_io_encoding
    

    Yes, it's possible to get an infinite loop here if this "setenv" fails.

    0 讨论(0)
提交回复
热议问题