When answering a comment to another answer of mine here, I found what I think may be a hole in the C standard (c1x, I haven\'t checked the earlier ones and yes, I k
Aren't the units of "size_t sz" in whatever the addressable unit of your architecture is? I work with a DSP whose addresses correspond to 32-bit values, not bytes. malloc(1) gets me a pointer to a 4-byte area.
In a 16-bit char
environment malloc(10 * sizeof(char))
will allocate 10 char
s (10 bytes), because if char
is 16 bits, then that architecture/implementation defines a byte as 16 bits. A char
isn't an octet, it's a byte. On older computers this can be larger than the 8 bit de-facto standard we have today.
The relevant section from the C standard follows:
3.6 Terms, definitions and symbols
byte - addressable unit of data storage large enough to hold any member of the basic character set of the execution environment...
NOTE 2 - A byte is composed of a contiguous sequence of bits, the number of which is implementation-defined.
In the C99 standard the rigorous correlation between bytes, char
, and object size is given in 6.2.6.1/4 "Representations of types - General":
Values stored in non-bit-field objects of any other object type consist of
n × CHAR_BIT
bits, wheren
is the size of an object of that type, in bytes. The value may be copied into an object of typeunsigned char [n]
(e.g., by memcpy); the resulting set of bytes is called the object representation of the value.
In the C++ standard the same relationship is given in 3.9/2 "Types":
For any object (other than a base-class subobject) of POD type T, whether or not the object holds a valid value of type T, the underlying bytes (1.7) making up the object can be copied into an array of char or unsigned char. If the content of the array of char or unsigned char is copied back into the object, the object shall subsequently hold its original value.
In C90 there doesn't appear to be as explicitly mentioned correlation, but between the definition of a byte, the definition of a character, and the definition of the sizeof
operator the inference can be made that a char
type is equivalent to a byte.
Also note that the number of bits in a byte (and the number of bits in a char
) is implementation defined—strictly speaking it doesn't need to be 8 bits. And onebyone points out in a comment elsewhere that DSPs commonly have bytes with a number of bits that isn't 8.
Note that IETF RFCs and standards generally (always?) use the term 'octect' instead of 'byte' to be unambiguous that the units they're talking about have exactly 8 bits - no more, no less.