Why don't scripting languages output Unicode to the Windows console?

前端 未结 9 1463
迷失自我
迷失自我 2020-12-08 11:18

The Windows console has been Unicode aware for at least a decade and perhaps as far back as Windows NT. However for some reason the major cross-platform scripting languages

相关标签:
9条回答
  • 2020-12-08 11:45

    For Perl to fully support Windows in this way, every call to print printf say warn and die has to be modified.

    • Is this Windows?
    • Which version of Windows? Perl still mostly works on Windows 95
    • Is this going to the console, or somewhere else.

    Once you have that determined, you then have to use a completely different set of API functions.

    If you really want to see everything involved in doing this properly, have a look at the source of Win32::Unicode::Console.


    On Linux, OpenBSD, FreeBSD and similar OS's you can usually just call binmode on the STDOUT and STDERR file handles.

    binmode STDOUT, ':utf8';
    binmode STDERR, ':utf8';
    

    This assumes that the terminal is using the UTF-8 encoding.

    0 讨论(0)
  • 2020-12-08 11:47

    The main problem seems to be that it is not possible to use Unicode on Windows using only the standard C library and no platform-dependent or third-party extensions. The languages you mentioned originate from Unix platforms, whose method of implementing Unicode blends well with C (they use normal char* strings, the C locale functions, and UTF-8). If you want to do Unicode in C, you more or less have to write everything twice: once using nonstandard Microsoft extensions, and once using the standard C API functions for all other operating systems. While this can be done, it usually doesn't have high priority because it's cumbersome and most scripting language developers either hate or ignore Windows anyway.

    At a more technical level, I think the basic assumption that most standard library designers make is that all I/O streams are inherently byte-based on the OS level, which is true for files on all operating systems, and for all streams on Unix-like systems, with the Windows console being the only exception. Thus the architecture many class libraries and programming language standard have to be modified to a great extent if one wants to incorporate Windows console I/O.

    Another more subjective point is that Microsoft just did not enough to promote the use of Unicode. The first Windows OS with decent (for its time) Unicode support was Windows NT 3.1, released in 1993, long before Linux and OS X grew Unicode support. Still, the transition to Unicode in those OSes has been much more seamless and unproblematic. Microsoft once again listened to the sales people instead of the engineers, and kept the technically obsolete Windows 9x around until 2001; instead of forcing developers to use a clean Unicode interface, they still ship the broken and now-unnecessary 8-bit API interface, and invite programmers to use it (look at a few of the recent Windows API questions on Stack Overflow, most newbies still use the horrible legacy API!).

    When Unicode came out, many people realized it was useful. Unicode started as a pure 16-bit encoding, so it was natural to use 16-bit code units. Microsoft then apparently said "OK, we have this 16-bit encoding, so we have to create a 16-bit API", not realizing that nobody would use it. The Unix luminaries, however, thought "how can we integrate this into the current system in an efficient and backward-compatible way so that people will actually use it?" and subsequently invented UTF-8, which is a brilliant piece of engineering. Just as when Unix was created, the Unix people thought a bit more, needed a bit longer, has less financially success, but did it eventually right.

    I cannot comment on Perl (but I think that there are more Windows haters in the Perl community than in the Python community), but regarding Python I know that the BDFL (who doesn't like Windows as well) has stated that adequate Unicode support on all platforms is a major goal.

    0 讨论(0)
  • 2020-12-08 11:53

    Michael Kaplan has series of blog posts about the cmd console and Unicode that may be informative (while not really answering your question):

    • Conventional wisdom is retarded, aka What the @#%&* is _O_U16TEXT?

    • Anyone who says the console can't do Unicode isn't as smart as they think they are

    • A confluence of circumstances leaves a stone unturned...

    PS: Thanks @Jeff for finding the archive.org links.

    0 讨论(0)
提交回复
热议问题