C++ strings: UTF-8 or 16-bit encoding?

后端未结

关注

 8  1622

广开言路 2021-02-04 09:42

I\'m still trying to decide whether my (home) project should use UTF-8 strings (implemented in terms of std::string with additional UTF-8-specific functions when necessary) or s

8条回答

南方客 (楼主)

2021-02-04 10:46

UTF-16 is still a variable length character encoding (there are more than 2^16 unicode codepoints), so you can't do O(1) string indexing operations. If you're doing lots of that sort of thing, you're not saving anything in speed over UTF-8. On the other hand, if your text includes a lot of codepoints in the 256-65535 range, UTF-16 can be a substantial improvement in size. UCS-2 is a variation on UTF-16 that is fixed length, at the cost of prohibiting any codepoints greater than 2^16.

Without knowing more about your requirements, I would personally go for UTF-8. It's the easiest to deal with for all the reasons others have already listed.

0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...