C programming: How to program for Unicode?

前端未结

关注

 8  1050

予麋鹿

What prerequisites are needed to do strict Unicode programming?

Does this imply that my code should not use char types anywhere and that functions need

相关标签:

8条回答

猫巷女王i

2020-11-28 19:32

You basically want to deal with strings in memory as wchar_t arrays instead of char. When you do any kind of I/O (like reading/writing files) you can encode/decode using UTF-8 (this is probably the most common encoding) which is simple enough to implement. Just google the RFCs. So in-memory nothing should be multi-byte. One wchar_t represents one character. When you come to serializing however, that's when you need to encode to something like UTF-8 where some characters are represented by multiple bytes.

You'll also have to write new versions of strcmp etc. for the wide character strings, but this isn't a big issue. The biggest problem will be interop with libraries/existing code that only accept char arrays.

And when it comes to sizeof(wchar_t) (you will need 4 bytes if you want to do it right) you can always redefine it to a larger size with typedef/macro hacks if you need to.

0 讨论(0)
发布评论:

提交评论
- 加载中...
鱼传尺愫

2020-11-28 19:33
To do strict Unicode programming:
- Only use string APIs that are Unicode aware (NOT strlen, strcpy, ... but their widestring counterparts wstrlen, wsstrcpy, ...)
- When dealing with a block of text, use an encoding that allows storing Unicode chars (utf-7, utf-8, utf-16, ucs-2, ...) without loss.
- Check that your OS default character set is Unicode compatible (ex: utf-8)
- Use fonts that are Unicode compatible (e.g. arial_unicode)
Multi-byte character sequences is an encoding that pre-dates the UTF-16 encoding (the one used normally with wchar_t) and it seems to me it is rather Windows-only.

I've never heard of wint_t.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2