How to convert unicode code points to utf-8 in c++?

前端 未结 6 419
暖寄归人
暖寄归人 2020-12-18 05:08

I have an array consisting of unicode code points

unsigned short array[3]={0x20ac,0x20ab,0x20ac};

I just want this to be converted as utf-8

相关标签:
6条回答
  • 2020-12-18 05:31

    Finally! With C++11!

    #include <string>
    #include <locale>
    #include <codecvt>
    #include <cassert>
    
    int main()
    {
        std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> converter;
        std::string u8str = converter.to_bytes(0x20ac);
        assert(u8str == "\xe2\x82\xac");
    }
    
    0 讨论(0)
  • Following code may help you,

    #include <atlconv.h>
    #include <atlstr.h>
    
    #define ASSERT ATLASSERT
    
    int main()
    {
        const CStringW unicode1 = L"\x0391 and \x03A9"; // 'Alpha' and 'Omega'
    
        const CStringA utf8 = CW2A(unicode1, CP_UTF8);
    
        ASSERT(utf8.GetLength() > unicode1.GetLength());
    
        const CStringW unicode2 = CA2W(utf8, CP_UTF8);
    
        ASSERT(unicode1 == unicode2);
    }
    
    0 讨论(0)
  • 2020-12-18 05:33

    With std c++

    #include <iostream>
    #include <locale>
    #include <vector>
    
    int main()
    {
        typedef std::codecvt<wchar_t, char, mbstate_t> Convert;
        std::wstring w = L"\u20ac\u20ab\u20ac";
        std::locale locale("en_GB.utf8");
        const Convert& convert = std::use_facet<Convert>(locale);
    
        std::mbstate_t state;
        const wchar_t* from_ptr;
        char* to_ptr;
        std::vector<char> result(3 * w.size() + 1, 0);
        Convert::result convert_result = convert.out(state,
              w.c_str(), w.c_str() + w.size(), from_ptr,
              result.data(), result.data() + result.size(), to_ptr);
    
        if (convert_result == Convert::ok)
            std::cout << result.data() << std::endl;
        else std::cout << "Failure: " << convert_result << std::endl;
    }
    
    0 讨论(0)
  • 2020-12-18 05:34

    The term Unicode refers to a standard for encoding and handling of text. This incorporates encodings like UTF-8, UTF-16, UTF-32, UCS-2, ...

    I guess you are programming in a Windows environment, where Unicode typically refers to UTF-16.

    When working with Unicode in C++, I would recommend the ICU library.

    If you are programming on Windows, don't want to use an external library, and have no constraints regarding platform dependencies, you can use WideCharToMultiByte.

    Example for ICU:

    #include <iostream>
    #include <unicode\ustream.h>
    
    using icu::UnicodeString;
    
    int main(int, char**) {
        //
        // Convert from UTF-16 to UTF-8
        //
        std::wstring utf16 = L"foobar";
        UnicodeString str(utf16.c_str());
        std::string utf8;
        str.toUTF8String(utf8);
    
        std::cout << utf8 << std::endl;
    }
    

    To do exactly what you want:

    // Assuming you have ICU\include in your include path
    // and ICU\lib(64) in your library path.
    #include <iostream>
    #include <fstream>
    #include <unicode\ustream.h>
    #pragma comment(lib, "icuio.lib")
    #pragma comment(lib, "icuuc.lib")
    
    void writeUtf16ToUtf8File(char const* fileName, wchar_t const* arr, size_t arrSize) {
        UnicodeString str(arr, arrSize);
        std::string utf8;
        str.toUTF8String(utf8);
    
        std::ofstream out(fileName, std::ofstream::binary);
        out << utf8;
        out.close();
    }
    
    0 讨论(0)
  • 2020-12-18 05:41

    Iconv is a popular library used on many platforms.

    0 讨论(0)
  • 2020-12-18 05:47

    This code uses WideCharToMultiByte (I assume that you are using Windows):

    unsigned short wide_str[3] = {0x20ac, 0x20ab, 0x20ac};
    int utf8_size = WideCharToMultiByte(CP_UTF8, 0, wide_str, 3, NULL, 0, NULL, NULL) + 1;
    char* utf8_str = calloc(utf8_size);
    WideCharToMultiByte(CP_UTF8, 0, wide_str, 3, utf8_str, utf8_size, NULL, NULL);
    

    You need to call it twice: first time to get number of output bytes, and second time to actually convert it. If you know output buffer size, you may skip first call. Or, you can simply allocate buffer 2x larger than original + 1 byte (for your case it means 12+1 bytes) - it should be always enough.

    0 讨论(0)
提交回复
热议问题