Every now and then, someone on SO points out that char (aka \'byte\') isn\'t necessarily 8 bits.
It seems that 8-bit char
is almost universal. I would h
For one, Unicode characters are longer than 8-bit. As someone mentioned earlier, the C spec defines data types by their minimum sizes. Use sizeof
and the values in limits.h
if you want to interrogate your data types and discover exactly what size they are for your configuration and architecture.
For this reason, I try to stick to data types like uint16_t
when I need a data type of a particular bit length.
Edit: Sorry, I initially misread your question.
The C spec says that a char
object is "large enough to store any member of the execution character set". limits.h
lists a minimum size of 8 bits, but the definition leaves the max size of a char
open.
Thus, the a char
is at least as long as the largest character from your architecture's execution set (typically rounded up to the nearest 8-bit boundary). If your architecture has longer opcodes, your char
size may be longer.
Historically, the x86 platform's opcode was one byte long, so char
was initially an 8-bit value. Current x86 platforms support opcodes longer than one byte, but the char
is kept at 8 bits in length since that's what programmers (and the large volumes of existing x86 code) are conditioned to.
When thinking about multi-platform support, take advantage of the types defined in stdint.h
. If you use (for instance) a uint16_t, then you can be sure that this value is an unsigned 16-bit value on whatever architecture, whether that 16-bit value corresponds to a char
, short
, int
, or something else. Most of the hard work has already been done by the people who wrote your compiler/standard libraries.
If you need to know the exact size of a char
because you are doing some low-level hardware manipulation that requires it, I typically use a data type that is large enough to hold a char
on all supported platforms (usually 16 bits is enough) and run the value through a convert_to_machine_char
routine when I need the exact machine representation. That way, the platform-specific code is confined to the interface function and most of the time I can use a normal uint16_t
.