How to print() a string in Python3 without exceptions?

回眸只為那壹抹淺笑 提交于 2019-12-10 16:02:44

问题


Seemingly simple question: How do I print() a string in Python3? Should be a simple:

print(my_string)

But that doesn't work. Depending on the content of my_string, environment variables and the OS you use that will throw an UnicodeEncodeError exception:

>>> print("\u3423")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character '\u3423' in position 0: ordinal not in range(128)

Is there a clean portable way to fix this?

To expand a bit: The problem here is that a Python3 string contains Unicode encoded characters, while the Terminal can have any encoding. If you are lucky your Terminal can handle all the characters contained in the string and everything will be fine, if your Terminal can't (e.g. somebody set LANG=C), then you get an exception.

If you manually encode a string in Python3 you can supply an error handler that ignores or replaces unencodable characters:

  "\u3423".encode("ascii", errors="replace")

For print() I don't see an easy way to plug in an error handler and even if there is, a plain error handler seems like a terrible idea as it would modify the data. A conditional error handler might work (i.e. check isatty() and decide based on that what to do), but it seems awfully hacky to go through all that trouble just to print() a string and I am not even sure that it wouldn't fail in some cases.

A real world example this problem would be for example this one:

Python3: UnicodeEncodeError only when run from crontab


回答1:


Is there a clean portable way to fix this?

Set PYTHONIOENCODING=<encoding>:<error_handler> e.g.,

$ PYTHONIOENCODING=utf-8 python your_script.py >output-in-utf-8.txt

In your case, I'd configure your environment (LANG, LC_CTYPE) to accept non-ascii input:

$ locale charmap



回答2:


The most practical way to solve this issue seems to be to force the output encoding to utf-8:surrogateescape. This will not only force UTF-8 output, but also ensure that surrogate escaped strings, as returned by os.fsdecode(), can be printed without throwing an exception. On command line this looks like this:

PYTHONIOENCODING=utf-8:surrogateescape python3 -c 'print("\udcff")'

To do this from within the program itself one has to reassign stdout and stderr, this can be done with (the line_buffering=True is important, as otherwise the output won't get flushed properly):

import sys
import io

sys.stdout = io.TextIOWrapper(sys.stdout.buffer, errors="surrogateescape", line_buffering=True)
sys.stderr = io.TextIOWrapper(sys.stderr.buffer, errors="surrogateescape", line_buffering=True)

print("\udcff")

This approach will cause characters to be incorrectly displayed on terminals not set to UTF-8, but this to me seems to be strongly prefered over randomly throwing exceptions and making it impossible to print filenames without corrupting them, as they might not be in any valid encoding at all on Linux systems.

I read in a few places that utf-8:surrogateescape might become the default in the future, but as of Python 3.6.0b2 that is not the case.




回答3:


The reason it is giving you an error is because it is trying to decipher what \u is. Just like \r is ascii for carriage return, \n - newline \t - tab etc...

If:

 my_string = '\u112'
 print(my_string)

That will give you an error, to print the '\' without it trying to find out what \ is is like so:

 my_string = '\\u122'
 print(my_string)

Output:

 \u122


来源:https://stackoverflow.com/questions/22494825/how-to-print-a-string-in-python3-without-exceptions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!