What could go wrong in switching HTML encoding from UTF-8 to UTF-16?

前端 未结 6 1276
北荒
北荒 2021-01-13 02:44

What are the implications of a change from UTF-8 to UTF-16 for HTML encoding? I would like to know your thoughts on the issue. Are there things I need to think of before m

相关标签:
6条回答
  • 2021-01-13 03:19

    There is also the byte order which becomes an issue with anything above 8-bit data. UTF encoded files begin with a byte order mark which is used to determine the byte order, or endianness, of that file.

    Wikipedia has a quite good explanation of this.

    0 讨论(0)
  • 2021-01-13 03:21

    As far as I know all modern browsers support UTF-16 encoding. But as others have pointed out, you should declare the encoding explicitly. Not all browsers and platforms will support all unicode characters, but I think this is regardless of which encoding you use.

    However, if bandwith is a big issue you should probably consider gzipping the HTML. This will save much more bandwidth than switching encoding.

    0 讨论(0)
  • 2021-01-13 03:25

    I can think of a few things that will go wrong:

    1. You MUST specify that it's UTF-16 in the HTTP header. Unlike UTF-8, UTF-16 is not ASCII compatible, which means that everything needs to be in UTF-16 from the start.
    2. Older clients don't support UTF-16. For example, anything on Windows 9x. Possibly Mac OS9 as well.
    3. Oh, wait, I almost forgot: North America and European copies of Windows XP don't have Asian fonts installed by default.
    0 讨论(0)
  • 2021-01-13 03:31

    I suspect most browsers won't even show your pages.

    0 讨论(0)
  • 2021-01-13 03:32

    Very nice article you have held here. Fundamentals states, "When a unique character encoding is required, the character encoding MUST be UTF-8, UTF-16 or UTF-32. US-ASCII is upwards-compatible with UTF-8 (an US-ASCII string is also a UTF-8 string, see [RFC 3629]), and UTF-8 is therefore appropriate if compatibility with US-ASCII is desired." In practice, compatibility with US-ASCII is so useful it's almost a requirement. The W3C wisely explains, "In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate. Possible reasons for choosing one of these include efficiency of internal processing and interoperability with other processes."

    0 讨论(0)
  • 2021-01-13 03:41
    • Your bandwidth consumption is likely to nearly double, assuming most of your HTML is ASCII
    • Clients which incorrectly assume UTF-8 (or ASCII) will be confused

    Why do you want to change to UTF-16?

    0 讨论(0)
提交回复
热议问题