C programming: How to program for Unicode?

前端 未结 8 1050
予麋鹿
予麋鹿 2020-11-28 18:26

What prerequisites are needed to do strict Unicode programming?

Does this imply that my code should not use char types anywhere and that functions need

相关标签:
8条回答
  • 2020-11-28 19:32

    You basically want to deal with strings in memory as wchar_t arrays instead of char. When you do any kind of I/O (like reading/writing files) you can encode/decode using UTF-8 (this is probably the most common encoding) which is simple enough to implement. Just google the RFCs. So in-memory nothing should be multi-byte. One wchar_t represents one character. When you come to serializing however, that's when you need to encode to something like UTF-8 where some characters are represented by multiple bytes.

    You'll also have to write new versions of strcmp etc. for the wide character strings, but this isn't a big issue. The biggest problem will be interop with libraries/existing code that only accept char arrays.

    And when it comes to sizeof(wchar_t) (you will need 4 bytes if you want to do it right) you can always redefine it to a larger size with typedef/macro hacks if you need to.

    0 讨论(0)
  • 2020-11-28 19:33

    To do strict Unicode programming:

    • Only use string APIs that are Unicode aware (NOT strlen, strcpy, ... but their widestring counterparts wstrlen, wsstrcpy, ...)
    • When dealing with a block of text, use an encoding that allows storing Unicode chars (utf-7, utf-8, utf-16, ucs-2, ...) without loss.
    • Check that your OS default character set is Unicode compatible (ex: utf-8)
    • Use fonts that are Unicode compatible (e.g. arial_unicode)

    Multi-byte character sequences is an encoding that pre-dates the UTF-16 encoding (the one used normally with wchar_t) and it seems to me it is rather Windows-only.

    I've never heard of wint_t.

    0 讨论(0)
提交回复
热议问题