Understanding Python Unicode and Linux terminal

后端 未结 2 882
伪装坚强ぢ
伪装坚强ぢ 2020-12-09 21:46

I have a Python script that writes some strings with UTF-8 encoding. In my script I am using mainly the str() function to cast to string. It looks like that:

相关标签:
2条回答
  • 2020-12-09 22:31

    If it outputs to the terminal then Python can examine the value of $LANG to pick a charset. All bets are off if you redirect.

    0 讨论(0)
  • 2020-12-09 22:37

    The terminal has a character set, and Python knows what that character set is, so it will automatically decode your Unicode strings to the byte-encoding that the terminal uses, in your case UTF-8.

    But when you redirect, you are no longer using the terminal. You are now just using a Unix pipe. That Unix pipe doesn't have a charset, and Python has no way of knowing which encoding you now want, so it will fall back to a default character set. You have marked your question with "Python-3.x" but your print syntax is Python 2, so I suspect you are actually using Python 2. And then your sys.getdefaultencoding() is generally 'ascii', and in your case it's definitely so. And of course, you can not encode Japanese characters as ASCII, so you get an error.

    Your best bet when using Python 2 is to encode the string with UTF-8 before printing it. Then redirection will work, and the resulting file with be UTF-8. That means it will not work if your terminal is something else, though, but you can get the terminal encoding from sys.stdout.encoding and use that (it will be None when redirecting under Python 2).

    In Python 3, your code should work as is, except that you need to change print mystring to print(mystring).

    0 讨论(0)
提交回复
热议问题