C++ iterate or split UTF-8 string into array of symbols?

后端 未结 5 1650
庸人自扰
庸人自扰 2020-12-25 08:26

Searching for a platform- and 3rd-party-library- independent way of iterating UTF-8 string or splitting it into array of UTF-8 symbols.

Please post a code snippet.

5条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2020-12-25 08:51

    Solved using tiny platform-independent UTF8 CPP library:

        char* str = (char*)text.c_str();    // utf-8 string
        char* str_i = str;                  // string iterator
        char* end = str+strlen(str)+1;      // end iterator
    
        do
        {
            uint32_t code = utf8::next(str_i, end); // get 32 bit code of a utf-8 symbol
            if (code == 0)
                continue;
    
            unsigned char[5] symbol = {0};
            utf8::append(code, symbol); // copy code to symbol
    
            // ... do something with symbol
        }
        while ( str_i < end );
    

提交回复
热议问题