C++ & Boost: encode/decode UTF-8

前端 未结 4 2038
南方客
南方客 2020-12-01 06:29

I\'m trying to do a very simple task: take a unicode-aware wstring and convert it to a string, encoded as UTF8 bytes, and then the opposite way aro

相关标签:
4条回答
  • 2020-12-01 07:18

    Thanks everyone, but ultimately I resorted to http://utfcpp.sourceforge.net/ -- it's a header-only library that's very lightweight and easy to use. I'm sharing a demo code here, should anyone find it useful:

    inline void decode_utf8(const std::string& bytes, std::wstring& wstr)
    {
        utf8::utf8to32(bytes.begin(), bytes.end(), std::back_inserter(wstr));
    }
    inline void encode_utf8(const std::wstring& wstr, std::string& bytes)
    {
        utf8::utf32to8(wstr.begin(), wstr.end(), std::back_inserter(bytes));
    }
    

    Usage:

    wstring ws(L"\u05e9\u05dc\u05d5\u05dd");
    string s;
    encode_utf8(ws, s);
    
    0 讨论(0)
  • 2020-12-01 07:20

    There's already a boost link in the comments, but in the almost-standard C++0x, there is wstring_convert that does this

    #include <iostream>
    #include <string>
    #include <locale>
    #include <codecvt>
    int main()
    {
        wchar_t uchars[] = {0x5e9, 0x5dc, 0x5d5, 0x5dd, 0};
        std::wstring_convert<std::codecvt_utf8<wchar_t>> conv;
        std::string s = conv.to_bytes(uchars);
        std::wstring ws2 = conv.from_bytes(s);
        std::cout << std::boolalpha
                  << (s == "\xd7\xa9\xd7\x9c\xd7\x95\xd7\x9d" ) << '\n'
                  << (ws2 == uchars ) << '\n';
    }
    

    output when compiled with MS Visual Studio 2010 EE SP1 or with CLang++ 2.9

    true 
    true
    
    0 讨论(0)
  • 2020-12-01 07:21

    For a drop-in replacement for std::string/std::wstring that handles utf8, see TINYUTF8.

    In combination with <codecvt> you can convert pretty much from/to every encoding from/to utf8, which you then handle through the above library.

    0 讨论(0)
  • 2020-12-01 07:28

    Boost.Locale was released in Boost 1.48(November 15th, 2011) making it easier to convert from and to UTF8/16

    Here are some convenient examples from the docs:

    string utf8_string = to_utf<char>(latin1_string,"Latin1");
    wstring wide_string = to_utf<wchar_t>(latin1_string,"Latin1");
    string latin1_string = from_utf(wide_string,"Latin1");
    string utf8_string2 = utf_to_utf<char>(wide_string);
    

    Almost as easy as Python encoding/decoding :)

    Note that Boost.Locale is not a header-only library.

    0 讨论(0)
提交回复
热议问题