Python script receiving a UnicodeEncodeError: 'ascii' codec can't encode character

后端未结

关注

 3  1926

南笙 2021-01-25 16:16

I have a simple Python script that pulls posts from reddit and posts them on Twitter. Unfortunately, tonight it began having issues that I\'m assuming are because of someone\'s

3条回答

旧时难觅i (楼主)

2021-01-25 17:09
Consider this simple program:
```
print(u'\u201c' + "python")
```
If you try printing to a terminal (with an appropriate character encoding), you get
```
“python
```
However, if you try redirecting output to a file, you get a UnicodeEncodeError.
```
script.py > /tmp/out
Traceback (most recent call last):
  File "/home/unutbu/pybin/script.py", line 4, in 
    print(u'\u201c' + "python")
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 0: ordinal not in range(128)
```
When you print to a terminal, Python uses the terminal's character encoding to encode unicode. (Terminals can only print bytes, so unicode must be encoded in order to be printed.)

When you redirect output to a file, Python can not determine the character encoding since files have no declared encoding. So by default Python2 implicitly encodes all unicode using the ascii encoding before writing to the file. Since u'\u201c' can not be ascii encoded, a UnicodeEncodeError. (Only the first 127 unicode code points can be encoded with ascii).

This issue is explained in detail in the Why Print Fails wiki.

To fix the problem, first, avoid adding unicode and byte strings. This causes implicit conversion using the ascii codec in Python2, and an exception in Python3. To future-proof your code, it is better to be explicit. For example, encode post explicitly before formatting and printing the bytes:
```
post = post.encode('utf-8')
print('{} {} #python'.format(post, post_dict[post]))
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...