WChars, Encodings, Standards and Portability

后端 未结 4 1623
遇见更好的自我
遇见更好的自我 2020-11-22 09:11

The following may not qualify as a SO question; if it is out of bounds, please feel free to tell me to go away. The question here is basically, \"Do I understand the C stand

4条回答
  •  情话喂你
    2020-11-22 09:42

    The problem with wchar_t is that encoding-agnostic text processing is too difficult and should be avoided. If you stick with "pure C" as you say, you can use all of the w* functions like wcscat and friends, but if you want to do anything more sophisticated then you have to dive into the abyss.

    Here are some things that much harder with wchar_t than they are if you just pick one of the UTF encodings:

    • Parsing Javascript: Identifers can contain certain characters outside the BMP (and lets assume that you care about this kind of correctness).

    • HTML: How do you turn 𐀀 into a string of wchar_t?

    • Text editor: How do you find grapheme cluster boundaries in a wchar_t string?

    If I know the encoding of a string, I can examine the characters directly. If I don't know the encoding, I have to hope that whatever I want to do with a string is implemented by a library function somewhere. So the portability of wchar_t is somewhat irrelevant as I don't consider it an especially useful data type.

    Your program requirements may differ and wchar_t may work fine for you.

提交回复
热议问题