I\'m writing a wrapper layer to be used with mingw which provides the application with a virtual UTF-8 environment. Functions which deal with filenames are wrappers which co
I'd do something like #4, but don't generate any output until you're sure the input is valid.
mbrtowc
should decode the entire character. If it's outside the BMP, then output the high surrogate and store the low surrogate in the mbstate_t
.wcrtomb
should store high surrogates in the mbstate_t
, then output all 4 UTF-8 bytes if the character is valid.If you are on windows, you convert between UTF-16 and UTF-8 a whole string at a time using MultiByteToWideChar and WideCharToMultiByte.
While the default mode in GCC is a 32bit wchar_t there are compile switches that change that, and more generally the c & c++ specs don't specify the size of wchar_t - in fact wchar_t can be the same size as char.
If you want to avoid using Windows APIs (in your windows wrapper code!?) then use mbstowcs to convert an entire string at a time.