Handle UTF-8 string

前端 未结 5 1471
慢半拍i
慢半拍i 2021-01-13 06:30

as I know linux uses UTF-8 encoding. This means I can use std::string for handling string right? Just the encoding will be UTF-8.

Now on UTF-8 we know s

5条回答
  •  北荒
    北荒 (楼主)
    2021-01-13 07:00

    I'm using libunistring library, which can help you deal with all your questions.

    For example, here is simple string length (in utf-8 characters) function:

    size_t my_utf8_strlen(uint8_t *str) {
        if (str == NULL) return 0;
        if ((*str) == 0) return 0;
    
        size_t length = 0;
        uint8_t *current = str;
        // UTF-8 character.
        ucs4_t ucs_c = UNINAME_INVALID;
    
        while (current && *current) {
            current = u8_next(&ucs_c, current);
            length++; 
    
            // Broken character.
            if (ucs_c == UNINAME_INVALID || ucs_c == 0xfffd) 
            return length - 1;
        }
    
        return length;
    }
    
    // Use case
    std::string test;
    
    // Loading some text in `test` variable.
    // ...
    
    std::cout << my_utf8_strlen(&test[0]) << std::endl;
    

自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题