as I know linux uses UTF-8 encoding.
This means I can use std::string
for handling string right?
Just the encoding will be UTF-8.
Now on UTF-8 we know s
I'm using libunistring library, which can help you deal with all your questions.
For example, here is simple string length (in utf-8 characters) function:
size_t my_utf8_strlen(uint8_t *str) {
if (str == NULL) return 0;
if ((*str) == 0) return 0;
size_t length = 0;
uint8_t *current = str;
// UTF-8 character.
ucs4_t ucs_c = UNINAME_INVALID;
while (current && *current) {
current = u8_next(&ucs_c, current);
length++;
// Broken character.
if (ucs_c == UNINAME_INVALID || ucs_c == 0xfffd)
return length - 1;
}
return length;
}
// Use case
std::string test;
// Loading some text in `test` variable.
// ...
std::cout << my_utf8_strlen(&test[0]) << std::endl;