UnicodeDecodeError: 'utf8' codec can't decode byte 0x9c

前端未结

关注

 10  1902

暖寄归人

I have a socket server that is supposed to receive UTF-8 valid characters from clients.

The problem is some clients (mainly hackers) are sending all the wrong kind of

相关标签:

10条回答

滥情空心

2020-11-22 15:07
http://docs.python.org/howto/unicode.html#the-unicode-type
```
str = unicode(str, errors='replace')
```
or
```
str = unicode(str, errors='ignore')
```
Note: This will strip out (ignore) the characters in question returning the string without them.

For me this is ideal case since I'm using it as protection against non-ASCII input which is not allowed by my application.

Alternatively: Use the open method from the codecs module to read in the file:
```
import codecs
with codecs.open(file_name, 'r', encoding='utf-8',
                 errors='ignore') as fdata:
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-11-22 15:07
Changing the engine from C to Python did the trick for me.

Engine is C:
```
pd.read_csv(gdp_path, sep='\t', engine='c')
```
'utf-8' codec can't decode byte 0x92 in position 18: invalid start byte

Engine is Python:
```
pd.read_csv(gdp_path, sep='\t', engine='python')
```
No errors for me.
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-11-22 15:08
I had same problem with UnicodeDecodeError and i solved it with this line. Don't know if is the best way but it worked for me.
```
str = str.decode('unicode_escape').encode('utf-8')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2020-11-22 15:13
What can you do if you need to make a change to a file, but don’t know the file’s encoding? If you know the encoding is ASCII-compatible and only want to examine or modify the ASCII parts, you can open the file with the surrogateescape error handler:
```
with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
    data = f.read()
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
伪装坚强ぢ

2020-11-22 15:18
```
>>> '\x9c'.decode('cp1252')
u'\u0153'
>>> print '\x9c'.decode('cp1252')
œ
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2020-11-22 15:26
This type of issue crops up for me now that I've moved to Python 3. I had no idea Python 2 was simply steam rolling any issues with file encoding.

I found this nice explanation of the differences and how to find a solution after none of the above worked for me.

http://python-notes.curiousefficiency.org/en/latest/python3/text_file_processing.html

In short, to make Python 3 behave as similarly as possible to Python 2 use:
```
with open(filename, encoding="latin-1") as datafile:
    # work on datafile here
```
However, read the article, there is no one size fits all solution.
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页