Why short* instead of char* for string? Difference between char* and unsigned char*?

问题

As the title says, I'm having two questions.

Edit: To clarify, they don't actually use char and short, they ensure them to be 8-bit and 16-bit by specific typedefs. The actual type is then called UInt8 and UInt16.

1. Question

The iTunes SDK uses unsigned short* where a string is needed. What are the advantages of using it instead of char*/unsigned char*? How to convert it to char*, and what differs when working with this type instead?

2. Question

I've only seen char* when a string must be stored, yet. When should I use unsigned char* then, or doesn't it make any difference?

回答1:

unsigned short arrays can be used with wide character strings - for instance if you have UTF-16 encoded texts - although I'd expect to see wchar_t in those cases. But they may have their reasons, like being compatible between MacOS and Windows. (If my sources are right, MacOS' wchar_t is 32 bits, while Windows' is 16 bits.)

You convert between the two types of string by calling the appropriate library function. Which function is appropriate depends on the situation. Doesn't the SDK come with one?

And char instead of unsigned char, well, all strings have historically always been defined with char, so switching to unsigned char would introduce incompatibilities.
(Switching to signed char would also cause incompatibilities, but somehow not as many...)

Edit Now the question has been edited, let me say that I didn't see the edits before I typed my answer. But yes, UInt16 is a better representation of a 16 bit entity than wchar_t for the above reason.

回答2:

1. Question - Answer

I would suppose that they use unsigned short* because they must be utilizing UTF-16 encoding for unicode characters and hence representing characters both in and out of the BMP. The rest of your question depends on the type of Unicode encoding of the source and the destination (UTF-8,16,32)

2. Question - Answer

Again depends on the type of encoding and what strings are you talking about. You should never used signed or unsigned characters if you plan to deal with strings of characters outside of the Extended ASCII table. (Any other language except from English)

回答3:

Probably a harebrained attempt to use UTF-16 strings. C has a wide character type, wchar_t and its chars (or wchar_ts) can be 16 bits long. Though I'm not familiar enough with the SDK to say why exactly they went through this route, it's probably to work around compiler issues. In C99 there are much more suitable [u]int[least/fast]16_t types - see <stdint.h>.

Note that C makes very little guarantees about data types and their underlying sizes. Signed or unsigned shorts aren't guaranteed to be 16 bits (though they are guaranteed to be at least that much), nor are chars restricted to 8 or widechars 16 or 32.

To convert between char and short strings, you'd use the conversion functions provided by the SDK. You could also write your own or use a 3rd party library, if you knew exactly what they stored in those short strings AND what you wanted in your char strings.
It doesn't really make a difference. You'd normally convert to unsigned char if you wanted to do (unsigned) arithmetic or bit manipulation on a character.

Edit: I wrote (or started writing, anyhow) this answer before you told us they used UInt16 and not unsigned short. In that case there are no hare brains involved; the proprietary type is probably used for compatibility with older (or noncompliant) compilers which don't have the stdint types, to store UTF-16 data. Which is perfectly reasonable.

来源：https://stackoverflow.com/questions/9295363/why-short-instead-of-char-for-string-difference-between-char-and-unsigned-ch

标签

character-encoding

char

unsigned

short