Unsigned integer as UTF-8 value

后端 未结 4 1220
故里飘歌
故里飘歌 2021-02-03 16:46

assuming that I have

uint32_t a(3084);

I would like to create a string that stores the unicode character U+3084 which means that I

相关标签:
4条回答
  • 2021-02-03 16:50

    The C++ standard contains the std::codecvt<char32_t, char, mbstate_t> facet which converts between UTF-32 and UTF-8 according to 22.4.1.4 [locale.codecvt] paragraph 3. Sadly, the std::codecvt<...> facets aren't easy to use. At some point there was discussion about filtering stream buffers which would take case of the code conversion (the standard C++ library needs to implement them anyway for std::basic_filebuf<...>) but I can't see any trace of these.

    0 讨论(0)
  • 2021-02-03 16:55

    std::string_convert::to_bytes has a single-char overload just for you.

    #include <iostream>
    #include <string>
    #include <locale>
    #include <codecvt>
    #include <iomanip>
    
    // utility function for output
    void hex_print(const std::string& s)
    {
        std::cout << std::hex << std::setfill('0');
        for(unsigned char c : s)
            std::cout << std::setw(2) << static_cast<int>(c) << ' ';
        std::cout << std::dec << '\n';
    }
    
    int main()
    {
        uint32_t a(3084);
    
        std::wstring_convert<std::codecvt_utf8<char32_t>, char32_t> conv1;
        std::string u8str = conv1.to_bytes(a);
        std::cout << "UTF-8 conversion produced " << u8str.size() << " bytes:\n";
        hex_print(u8str);
    }
    

    I get (with libc++)

    $ ./test
    UTF-8 conversion produced 3 bytes:
    e0 b0 8c 
    
    0 讨论(0)
  • 2021-02-03 17:04
    auto s = u8"\343\202\204"; // Octal escaped representation of HIRAGANA LETTER YA
    std::cout << s << std::endl;
    

    prints

    for me (using g++ 4.8.1). s has type const char*, as you'd expect, but I don't know if this is implementation defined. Unfortunately C++ doesn't have any support for manipulation of UTF8 strings are far as I know; for that you need to use a library like Glib::ustring.

    0 讨论(0)
  • 2021-02-03 17:09

    Here's some C++ code that wouldn't be hard to convert to C. Adapted from an older answer.

    std::string UnicodeToUTF8(unsigned int codepoint)
    {
        std::string out;
    
        if (codepoint <= 0x7f)
            out.append(1, static_cast<char>(codepoint));
        else if (codepoint <= 0x7ff)
        {
            out.append(1, static_cast<char>(0xc0 | ((codepoint >> 6) & 0x1f)));
            out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
        }
        else if (codepoint <= 0xffff)
        {
            out.append(1, static_cast<char>(0xe0 | ((codepoint >> 12) & 0x0f)));
            out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
            out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
        }
        else
        {
            out.append(1, static_cast<char>(0xf0 | ((codepoint >> 18) & 0x07)));
            out.append(1, static_cast<char>(0x80 | ((codepoint >> 12) & 0x3f)));
            out.append(1, static_cast<char>(0x80 | ((codepoint >> 6) & 0x3f)));
            out.append(1, static_cast<char>(0x80 | (codepoint & 0x3f)));
        }
        return out;
    }
    
    0 讨论(0)
提交回复
热议问题