UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c

前端 未结 10 1902
暖寄归人
暖寄归人 2020-11-22 14:49

I have a socket server that is supposed to receive UTF-8 valid characters from clients.

The problem is some clients (mainly hackers) are sending all the wrong kind of

相关标签:
10条回答
  • 2020-11-22 15:07

    http://docs.python.org/howto/unicode.html#the-unicode-type

    str = unicode(str, errors='replace')
    

    or

    str = unicode(str, errors='ignore')
    

    Note: This will strip out (ignore) the characters in question returning the string without them.

    For me this is ideal case since I'm using it as protection against non-ASCII input which is not allowed by my application.

    Alternatively: Use the open method from the codecs module to read in the file:

    import codecs
    with codecs.open(file_name, 'r', encoding='utf-8',
                     errors='ignore') as fdata:
    
    0 讨论(0)
  • 2020-11-22 15:07

    Changing the engine from C to Python did the trick for me.

    Engine is C:

    pd.read_csv(gdp_path, sep='\t', engine='c')
    

    'utf-8' codec can't decode byte 0x92 in position 18: invalid start byte

    Engine is Python:

    pd.read_csv(gdp_path, sep='\t', engine='python')
    

    No errors for me.

    0 讨论(0)
  • 2020-11-22 15:08

    I had same problem with UnicodeDecodeError and i solved it with this line. Don't know if is the best way but it worked for me.

    str = str.decode('unicode_escape').encode('utf-8')
    
    0 讨论(0)
  • 2020-11-22 15:13

    What can you do if you need to make a change to a file, but don’t know the file’s encoding? If you know the encoding is ASCII-compatible and only want to examine or modify the ASCII parts, you can open the file with the surrogateescape error handler:

    with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
        data = f.read()
    
    0 讨论(0)
  • 2020-11-22 15:18
    >>> '\x9c'.decode('cp1252')
    u'\u0153'
    >>> print '\x9c'.decode('cp1252')
    œ
    
    0 讨论(0)
  • 2020-11-22 15:26

    This type of issue crops up for me now that I've moved to Python 3. I had no idea Python 2 was simply steam rolling any issues with file encoding.

    I found this nice explanation of the differences and how to find a solution after none of the above worked for me.

    http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html

    In short, to make Python 3 behave as similarly as possible to Python 2 use:

    with open(filename, encoding="latin-1") as datafile:
        # work on datafile here
    

    However, read the article, there is no one size fits all solution.

    0 讨论(0)
提交回复
热议问题