C++: how to convert ASCII or ANSI to UTF8 and stores in std::string

前端 未结 2 1481
庸人自扰
庸人自扰 2021-02-06 18:03

My company use some code like this:

    std::string(CT2CA(some_CString)).c_str()

which I believe it converts a Unicode string (whose type is CS

相关标签:
2条回答
  • 2021-02-06 18:47

    This sounds like a plain conversion from one encoding to another encoding: You can use std::codecvt<char, char, mbstate_t> for this. Whether your implementation ships with a suitable conversion, I don't know, however. From the sounds of it you just try to convert ISO-Latin-1 into Unicode. That should be pretty much trivial: the first 128 characters map (0 to 127) identically to UTF-8 and the second half conveniently map to the corresponding Unicode code points, i.e., you just need to encode the corresponding value into UTF-8. Each character will be replaced by two characters. That it, I think the conversion is something like that:

    // Takes the next position and the end of a buffer as first two arguments and the
    // character to convert from ISO-Latin-1 as third argument.
    // Returns a pointer to end of the produced sequence.
    char* iso_latin_1_to_utf8(char* buffer, char* end, unsigned char c) {
        if (c < 128) {
            if (buffer == end) { throw std::runtime_error("out of space"); }
            *buffer++ = c;
        }
        else {
            if (end - buffer < 2) { throw std::runtime_error("out of space"); }
            *buffer++ = 0xC0 | (c >> 6);
            *buffer++ = 0x80 | (c & 0x3f);
        }
        return buffer;
    }
    
    0 讨论(0)
  • 2021-02-06 18:56

    Becareful : it's '|' and not '&' !

    *buffer++ = 0xC0 | (c >> 6);
    *buffer++ = 0x80 | (c & 0x3F);
    
    0 讨论(0)
提交回复
热议问题