as I know linux uses UTF-8 encoding.
This means I can use std::string
for handling string right?
Just the encoding will be UTF-8.
Now on UTF-8 we know s
You may want to convert the UTF-8 encoded strings to some kind of fixed width encoding prior to manipulating them. But that depends on what you are trying to do.
To get the length in bytes of a UTF-8 string that's just str.size()
. To get the length in chars is slightly more difficult but you can get that by ignoring any byte in the string which has a value >= 0x80 and < 0xC0. In UTF-8 those values are always trailing bytes. So count the number of bytes like that and subtract it from the size of the string.
The above does ignore the issue of combining characters. It does rather depend on what your definition of character is.