Which character encoding is the IPython terminal using?

泪湿孤枕 提交于 2019-12-24 00:22:46

问题


I used to think I had this whole encoding stuff pretty figured out. I seem to be wrong because I can't explain what's happening here.

What I was trying to do is to use the tabulate module to print a nicely formatted table using

from tabulate import tabulate
s = tabulate([[1,2],[3,4]], ["x","y"], tablefmt="fancy_grid")
print(s)

in IPython 3.5.0's interactive console under Windows 10. I expected the result to be

╒═════╤═════╕
│   x │   y │
╞═════╪═════╡
│   1 │   2 │
├─────┼─────┤
│   3 │   4 │
╘═════╧═════╛

but instead, I got a

UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>

Puzzled, I tried to find out where the problem was and looked at the repr of the string:

In [15]: s
Out[15]: '╒═════╤═════╕\n│   x │   y │\n╞═════╪═════╡\n│   1 │   2 │\n├─────┼─────┤\n│   3 │   4 │\n╘═════╧═════╛'

Hmm, all the characters can be displayed by the terminal (even the first one that triggered the error).

Just checking some details:

In [16]: sys.stdout.encoding
Out[16]: 'cp850'

In [17]: s.encode("cp850")
[...]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>

So which encoding is the terminal using? Python says that it's cp850, and it tells me that cp850 doesn't have a character (which is true, it's one of the characters from cp437 that had to make room for accented letters), but I can see it in the terminal window!

To complicate things further, when using the native Python console instead of IPython, the error seems more understandable:

>>> s
'\u2552═══\u2564═══\u2555\n│ 1 │ 2 │\n├───┼───┤\n│ 3 │ 4 │\n\u2558═══\u2567═══\u255b'
>>> sys.stdout.encoding
'cp850'
>>> print(s)
Traceback (most recent call last):
[...]
UnicodeEncodeError: 'charmap' codec can't encode character '\u2552' in position 0: character maps to <undefined>

So at least Python is consistent, but what's happening with IPython?


回答1:


IPython uses OEM code page in the interactive mode like any other Python console program:

In [1]: '\u2552'
ERROR - failed to write data to stream: <_io.TextIOWrapper name='<stdout>' mode=
'w' encoding='cp850'>
Out[1]:

In [2]: !chcp
Active code page: 850

The result changes if pyreadline is installed (it enables colors in the IPython console among other things):

In [1]: '\u2552'
Out[1]: '╒'

In [2]: import sys

In [3]: sys.stdout.encoding
Out[3]: 'cp850'

In [4]: !chcp
Active code page: 850

Once pyreadline has been installed, IPython's sys.displayhook writes the result to readline's console object that uses WriteConsoleW() Windows Unicode API that allows to print even unencodable in the current code page Unicode characters (to see them, you might need to configure a (TrueType) font such as Lucida Console in the Windows console).



来源:https://stackoverflow.com/questions/33960660/which-character-encoding-is-the-ipython-terminal-using

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!