Does the C++ standard mandate an encoding for wchar_t?

后端未结

关注

 7  2165

礼貌的吻别 2021-02-10 07:09

Here are some excerpts from my copy of the 2014 draft standard N4140

22.5 Standard code conversion facets [locale.stdcvt]

3 F

7条回答

野的像风 (楼主)

2021-02-10 07:58

Let us differentiate between wchar_t and string literals built using the L prefix.

wchar_t is just an integer type, which may be larger than char.

String literals using the L prefix will generate strings using wchar_t characters. Exactly what that means is implementation-dependent. There is no requirement that such literals use any particular encoding. They might use UTF-16, UTF-32, or something else that has nothing to do with Unicode at all.

So if you want a string literal which is guaranteed to be encoded in a Unicode format, across all platforms, use u8, u, or U prefixes for the string literal.

One interpretation of these two paragraphs is that wchar_t must be encoded as either UCS2 or UCS4.

No, that is not a valid interpretation. wchar_t has no encoding; it's just a type. It is data which is encoded. A string literal prefixed by L may or may not be encoded in UCS2 or UCS4.

If you provide codecvt_utf8 a string of wchar_ts which are encoded in UCS2 or UCS4 (as appropriate to sizeof(wchar_t)), then it will work. But not because of wchar_t; it only works because the data you provide it is correctly encoded.

If 4.1 said "The facet shall convert between UTF-8 multibyte sequences and UCS2 or UCS4 or whatever encoding is imposed on wchar_t by the current global locale" there would be no problem.

The whole point of those codecvt_* facets is to perform locale-independent conversions. If you want locale-dependent conversions, you shouldn't use them. You should instead use the global codecvt facet.

0 讨论(0)

查看其它7个回答
发布评论:

提交评论
- 加载中...