UTF8 to/from wide char conversion in STL

后端 未结 10 743
面向向阳花
面向向阳花 2020-11-22 06:48

Is it possible to convert UTF8 string in a std::string to std::wstring and vice versa in a platform independent manner? In a Windows application I would use MultiByteToWideC

相关标签:
10条回答
  • 2020-11-22 07:12

    You can use the codecvt locale facet. There's a specific specialisation defined, codecvt<wchar_t, char, mbstate_t> that may be of use to you, although, the behaviour of that is system-specific, and does not guarantee conversion to UTF-8 in any way.

    0 讨论(0)
  • 2020-11-22 07:14

    There are several ways to do this, but the results depend on what the character encodings are in the string and wstring variables.

    If you know the string is ASCII, you can simply use wstring's iterator constructor:

    string s = "This is surely ASCII.";
    wstring w(s.begin(), s.end());
    

    If your string has some other encoding, however, you'll get very bad results. If the encoding is Unicode, you could take a look at the ICU project, which provides a cross-platform set of libraries that convert to and from all sorts of Unicode encodings.

    If your string contains characters in a code page, then may $DEITY have mercy on your soul.

    0 讨论(0)
  • 2020-11-22 07:19

    I've asked this question 5 years ago. This thread was very helpful for me back then, I came to a conclusion, then I moved on with my project. It is funny that I needed something similar recently, totally unrelated to that project from the past. As I was researching for possible solutions, I stumbled upon my own question :)

    The solution I chose now is based on C++11. The boost libraries that Constantin mentions in his answer are now part of the standard. If we replace std::wstring with the new string type std::u16string, then the conversions will look like this:

    UTF-8 to UTF-16

    std::string source;
    ...
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t> convert;
    std::u16string dest = convert.from_bytes(source);    
    

    UTF-16 to UTF-8

    std::u16string source;
    ...
    std::wstring_convert<std::codecvt_utf8_utf16<char16_t>,char16_t> convert;
    std::string dest = convert.to_bytes(source);    
    

    As seen from the other answers, there are multiple approaches to the problem. That's why I refrain from picking an accepted answer.

    0 讨论(0)
  • 2020-11-22 07:27

    ConvertUTF.h ConvertUTF.c

    Credit to bames53 for providing updated versions

    0 讨论(0)
提交回复
热议问题