WChars, Encodings, Standards and Portability

后端未结

关注

 4  1623

遇见更好的自我 2020-11-22 09:11

The following may not qualify as a SO question; if it is out of bounds, please feel free to tell me to go away. The question here is basically, \"Do I understand the C stand

4条回答

情话喂你 (楼主)

2020-11-22 09:42
The problem with wchar_t is that encoding-agnostic text processing is too difficult and should be avoided. If you stick with "pure C" as you say, you can use all of the w* functions like wcscat and friends, but if you want to do anything more sophisticated then you have to dive into the abyss.

Here are some things that much harder with wchar_t than they are if you just pick one of the UTF encodings:
- Parsing Javascript: Identifers can contain certain characters outside the BMP (and lets assume that you care about this kind of correctness).
- HTML: How do you turn 𐀀 into a string of wchar_t?
- Text editor: How do you find grapheme cluster boundaries in a wchar_t string?
If I know the encoding of a string, I can examine the characters directly. If I don't know the encoding, I have to hope that whatever I want to do with a string is implemented by a library function somewhere. So the portability of wchar_t is somewhat irrelevant as I don't consider it an especially useful data type.

Your program requirements may differ and wchar_t may work fine for you.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...