Handle UTF-8 string

前端 未结 5 1469
慢半拍i
慢半拍i 2021-01-13 06:30

as I know linux uses UTF-8 encoding. This means I can use std::string for handling string right? Just the encoding will be UTF-8.

Now on UTF-8 we know s

5条回答
  •  终归单人心
    2021-01-13 06:54

    You may want to convert the UTF-8 encoded strings to some kind of fixed width encoding prior to manipulating them. But that depends on what you are trying to do.

    To get the length in bytes of a UTF-8 string that's just str.size(). To get the length in chars is slightly more difficult but you can get that by ignoring any byte in the string which has a value >= 0x80 and < 0xC0. In UTF-8 those values are always trailing bytes. So count the number of bytes like that and subtract it from the size of the string.

    The above does ignore the issue of combining characters. It does rather depend on what your definition of character is.

提交回复
热议问题