What prerequisites are needed to do strict Unicode programming?
Does this imply that my code should not use char
types anywhere and that functions need
You basically want to deal with strings in memory as wchar_t
arrays instead of char. When you do any kind of I/O (like reading/writing files) you can encode/decode using UTF-8 (this is probably the most common encoding) which is simple enough to implement. Just google the RFCs. So in-memory nothing should be multi-byte. One wchar_t
represents one character. When you come to serializing however, that's when you need to encode to something like UTF-8 where some characters are represented by multiple bytes.
You'll also have to write new versions of strcmp
etc. for the wide character strings, but this isn't a big issue. The biggest problem will be interop with libraries/existing code that only accept char arrays.
And when it comes to sizeof(wchar_t)
(you will need 4 bytes if you want to do it right) you can always redefine it to a larger size with typedef
/macro
hacks if you need to.
To do strict Unicode programming:
strlen
, strcpy
, ... but their widestring counterparts wstrlen
, wsstrcpy
, ...) Multi-byte character sequences is an encoding that pre-dates the UTF-16 encoding (the one used normally with wchar_t
) and it seems to me it is rather Windows-only.
I've never heard of wint_t
.