Today I was learning some C++ basics and came to know about wchar_t
. I was not able to figure out, why do we actually need this datatype, and how do I use it?
The wchar_t data type is used to display wide characters that will occupy 16 bits
. This datatype occupies "2 or 4" bytes.
Mostly the wchar_t
datatype is used when international languages like japanese are used.
wchar_t
is a wide character. It is used to represent characters which require more memory to represent them than a regular char
. It is, for example, widely used in the Windows API.
However, the size of a wchar_t
is implementation-dependant and not guaranteed to be larger than char
. If you need to support a specific form of character format greater than 8 bits, you may want to turn to char32_t
and char16_t
which are guaranteed to be 32 and 16 bits respectively.
wchar_t
is specified in the C++ language in [basic.fundamental]/p5 as:
Type
wchar_t
is a distinct type whose values can represent distinct codes for all members of the largest extended character set specified among the supported locales ([locale]).
In other words, wchar_t
is a data type which makes it possible to work with text containing characters from any language without worrying about character encoding.
On platforms that support Unicode above the basic multilingual plane, wchar_t
is usually 4 bytes (Linux, BSD, macOS).
Only on Windows wchar_t
is 2 bytes and encoded with UTF-16LE, due to historical reasons (Windows initially supported UCS2 only).
In practice, the "1 wchar_t
= 1 character" concept becomes even more complicated, due to Unicode supporting combining characters and graphemes (characters represented by sequences of code points).
wchar_t
is intended for representing text in fixed-width, multi-byte encodings; since wchar_t
is usually 2 bytes in size it can be used to represent text in any 2-byte encoding. It can also be used for representing text in variable-width multi-byte encodings of which the most common is UTF-16.
On platforms where wchar_t
is 4 bytes in size it can be used to represent any text using UCS-4 (Unicode), but since on most platforms it's only 2 bytes it can only represent Unicode in a variable-width encoding (usually UTF-16). It's more common to use char
with a variable-width encoding e.g. UTF-8 or GB 18030.
About the only modern operating system to use wchar_t
extensively is Windows; this is because Windows adopted Unicode before it was extended past U+FFFF and so a fixed-width 2-byte encoding (UCS-2) appeared sensible. Now UCS-2 is insufficient to represent the whole of Unicode and so Windows uses UTF-16, still with wchar_t
2-byte code units.
The wchar_t type is used for characters of extended character sets. It is among other uses used with wstring which is a string that can hold single characters of extended character sets, as opposed to the string which might hold single characters of size char, or use more than one character to represent a single sign (like utf8).
The wchar_t size is dependent on the locales, and is by the standard said to be able to represent all members of the largest extended character set supported by the locales.
I understand most of them have answered it but as I was learning C++ basics too and came to know about wchar_t
, I would like to tell you what I understood after searching about it.
wchar_t
is used when you need to store a character over ASCII 255 , because these characters have a greater size than our character type 'char'. Hence, requiring more memory.
e.g.:
wchar_t var = L"Привет мир\n"; // hello world in russian
It generally has a size greater than 8-bit character.
The windows operating system uses it substantially.
It is usually used when there is a foreign language involved.