Are all Kanji characters in UTF-8 3 bytes long?

拜拜、爱过 提交于 2019-12-12 09:29:23

问题


Can someone please confirm that all Kanji characters in Chinese are 3 bytes long in UTF-8?


回答1:


The commonly used Hanzi/Kanji characters are in the "CJK Unified Ideographs" block between U+4E00 and U+9FFF, and take 3 bytes in UTF-8. (The Japanese Hiragana and Katakana characters also take 3 bytes.)

However, there are also some very rarely-used characters in the "CJK Unified Ideographs Extension B" and "CJK Compatibility Ideographs Supplement" blocks, which take 4 bytes in UTF-8.

Also be aware that Chinese text often contains ASCII characters like the digits 0-9.




回答2:


Yes, Kanji is U+4e00 to U+9faf, UTF8 3 bytes are U+0800 to U+FFFF.



来源:https://stackoverflow.com/questions/3678752/are-all-kanji-characters-in-utf-8-3-bytes-long

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!