Here are some excerpts from my copy of the 2014 draft standard N4140
22.5 Standard code conversion facets [locale.stdcvt]
3 F
As Elem
can be wchar_t
, char16_t
, or char32_t
, the clause 4.1 says nothing about a required wchar_t
encoding. It states something about the conversion performed.
From the wording, it is clear that the conversion is between UTF-8 and either UCS-2 or UCS-4, depending on the size of Elem
. So if wchar_t
is 16 bits, the conversion will be with UCS-2, and if it is 32 bits, UCS-4.
Why does the standard mention UCS-2 and UCS-4 and not UTF-16 and UTF-32 ? Because codecvt_utf8
will convert a multi-byte UTF8 to a single wide character:
codecvt_utf8
)Although, it is not clear to me what will happen, if an UTF-8 text would contain a sequence corresponds to a unicode character that is not available in UCS-2 used for a receiving char16_t
.