发表新帖

发表新帖

Handle UTF-8 string

前端未结

关注

 5  1469

慢半拍i 2021-01-13 06:30

as I know linux uses UTF-8 encoding. This means I can use std::string for handling string right? Just the encoding will be UTF-8.

Now on UTF-8 we know s

5条回答

终归单人心 (楼主)

2021-01-13 06:54

You may want to convert the UTF-8 encoded strings to some kind of fixed width encoding prior to manipulating them. But that depends on what you are trying to do.

To get the length in bytes of a UTF-8 string that's just str.size(). To get the length in chars is slightly more difficult but you can get that by ignoring any byte in the string which has a value >= 0x80 and < 0xC0. In UTF-8 those values are always trailing bytes. So count the number of bytes like that and subtract it from the size of the string.

The above does ignore the issue of combining characters. It does rather depend on what your definition of character is.

0 讨论(0)

查看其它5个回答
发布评论:

提交评论
- 加载中...

热议问题