_T( ) macro changes for UNICODE character data

前端 未结 2 853
渐次进展
渐次进展 2021-01-13 01:02

I have UNICODE application where in we use _T(x) which is defined as follows.

#if defined(_UNICODE)
#define _T(x) L ##x
#else
#define _T(x) x
#endif
<         


        
相关标签:
2条回答
  • 2021-01-13 01:32

    Ah! The wonders of portability :-)

    If you have a C99 compiler for all your platforms, use int_least16_t, uint_least16_t, ... from <stdint.h>. Most platforms also define int16_t but it's not required to exist (if the platform is capable of using exactly 16 bits at a time, the typedef int16_t must be defined).

    Now wrap all the strings in arrays of uint_least16_t and make sure your code does not expect values of uint_least16_t to wrap at 65535 ...

    0 讨论(0)
  • 2021-01-13 01:41

    You can't - not without c++0x support. c++0x defines the following ways of declaring string literals:

    • "string of char characters in some implementation defined encoding" - char
    • u8"String of utf8 chars" - char
    • u"string of utf16 chars" - char16_t
    • U"string of utf32 chars" - char32_t
    • L"string of wchar_t in some implementation defined encoding" - wchar_t

    Until c++0x is widely supported, the only way to encode a utf-16 string in a cross platform way is to break it up into bits:

    // make a char16_t type to stand in until msvc/gcc/etc supports
    // c++0x utf string literals
    #ifndef CHAR16_T_DEFINED
    #define CHAR16_T_DEFINED
    typedef unsigned short char16_t;
    #endif
    
    const char16_t strABC[] = { 'a', 'b', 'c', '\0' };
    // the same declaration would work for a type that changes from 8 to 16 bits:
    
    #ifdef _UNICODE
    typedef char16_t TCHAR;
    #else
    typedef char TCHAR;
    #endif
    const TCHAR strABC2[] = { 'a', 'b', 'b', '\0' };
    

    The _T macro can only deliver the goods on platforms where wchar_t's are 16bits wide. And, the alternative is still not truly cross-platform: The coding of char and wchar_t is implementation defined so 'a' does not necessarily encode the unicode codepoint for 'a' (0x61). Thus, to be strictly accurate, this is the only way of writing the string:

    const TCHAR strABC[] = { '\x61', '\x62', '\x63', '\0' };
    

    Which is just horrible.

    0 讨论(0)
提交回复
热议问题