Python3: UnicodeEncodeError: 'ascii' codec can't encode character '\xfc'

后端 未结 4 494
太阳男子
太阳男子 2021-01-11 17:36

I\'am trying to get running a very simple example on OSX with python 3.5.1 but I\'m really stucked. Have read so many articles that deal with similar problems but I can not

相关标签:
4条回答
  • 2021-01-11 18:07

    Try running your script with explicitly defined PYTHONIOENCODING environment variable:

    PYTHONIOENCODING=utf-8 python3 script.py
    
    0 讨论(0)
  • 2021-01-11 18:09

    If you are facing this problem while reading/writing a file, then try this

    import codecs
    
    # File read 
    with codecs.open(filename, 'r', encoding='utf8') as f:
        text = f.read()
    
    # File write
    with codecs.open(filename, 'w', encoding='utf8') as f:
        f.write(text)
    
    0 讨论(0)
  • 2021-01-11 18:13

    Your environment variables set wrong. Work's for me:

    echo "LC_ALL=en_US.UTF-8" >> /etc/environment
    echo "en_US.UTF-8 UTF-8" >> /etc/locale.gen
    echo "LANG=en_US.UTF-8" > /etc/locale.conf
    locale-gen en_US.UTF-8
    
    0 讨论(0)
  • 2021-01-11 18:14

    Remove the characters < and >:

    # coding=latin-1
    

    Those character are often used in examples to indicate where the encoding name goes, but the literal characters < and > should not be included in your file.

    For that to work, your file must be encoded using latin-1. If your file is actually encoded using utf-8, the encoding line should be

    # coding=utf-8
    

    For example, when I run this script (saved as a file with latin-1 encoding):

    # coding=latin-1
    
    mylist = [u'Glück', u'Spaß', u'Ähre',]
    print(mylist)
    
    for w in mylist:
        print(w.encode("latin-1"))
    

    I get this output (with no errors):

    ['Glück', 'Spaß', 'Ähre']
    b'Gl\xfcck'
    b'Spa\xdf'
    b'\xc4hre'
    

    That output looks correct. For example, the latin-1 encoding of ü is '\xfc'.

    I used my editor to save the file with latin-1 encoding. The contents of the file in hexadecimal are:

    $ hexdump -C  codec-question.py 
    00000000  23 20 63 6f 64 69 6e 67  3d 6c 61 74 69 6e 2d 31  |# coding=latin-1|
    00000010  0a 0a 6d 79 6c 69 73 74  20 3d 20 5b 75 27 47 6c  |..mylist = [u'Gl|
    00000020  fc 63 6b 27 2c 20 75 27  53 70 61 df 27 2c 20 75  |.ck', u'Spa.', u|
    00000030  27 c4 68 72 65 27 2c 5d  0a 70 72 69 6e 74 28 6d  |'.hre',].print(m|
    00000040  79 6c 69 73 74 29 0a 0a  66 6f 72 20 77 20 69 6e  |ylist)..for w in|
    00000050  20 6d 79 6c 69 73 74 3a  0a 20 20 20 20 70 72 69  | mylist:.    pri|
    00000060  6e 74 28 77 2e 65 6e 63  6f 64 65 28 22 6c 61 74  |nt(w.encode("lat|
    00000070  69 6e 2d 31 22 29 29 0a                           |in-1")).|
    00000078
    

    Note that the first byte (represented in hexadecimal) in the third line (i.e. the character at position 0x20) is fc. That is the latin-1 encoding of ü. If the file was encoded using utf-8, the character ü would be represented using two bytes, c3 bc.

    0 讨论(0)
提交回复
热议问题