问题
According to C11 WG14 draft version N1570:
The header
<ctype.h>
declares several functions useful for classifying and mapping characters. In all cases the argument is anint
, the value of which shall be representable as anunsigned char
or shall equal the value of the macroEOF
. If the argument has any other value, the behavior is undefined.
Is it undefined behaviour?:
#include <ctype.h>
#include <limits.h>
#include <stdlib.h>
int main(void) {
char c = CHAR_MIN; /* let assume that char is signed and CHAR_MIN < 0 */
return isspace(c) ? EXIT_FAILURE : EXIT_SUCCESS;
}
Does the standard allow to pass char
to isspace()
(char
to int
)? In other words, is char
after conversion to int
representable as an unsigned char
?
Here's how wiktionary defines "representable":
Capable of being represented.
Is char
capable of being represented as unsigned char
? Yes. §6.2.6.1/4:
Values stored in non-bit-field objects of any other object type consist of n
×
CHAR_BIT
bits, where n is the size of an object of that type, in bytes. The value may be copied into an object of type unsigned char [n] (e.g., by memcpy); the resulting set of bytes is called the object representation of the value.
sizeof(char) == 1
therefore its object representation is unsigned char[1]
i.e., char
is capable of being represented as an unsigned char
. Where am I wrong?
Concrete example, I can represent [-2, -1, 0, 1]
as [0, 1, 2, 3]
. If I can't then why?
Related: According to §6.3.1.3 isspace((unsigned char)c)
is portable if INT_MAX >= UCHAR_MAX
otherwise it is implementation-defined.
回答1:
Under the assumption that char is signed then this would be undefined behavior, otherwise it is well defined since CHAR_MIN
would have the value 0
. It is easier to see the intention and meaning of:
the value of which shall be representable as an unsigned char or shall equal the value of the macro EOF
if we read section 7.4
Character handling <ctype.h> from the Rationale for International Standard—Programming Languages—C which says (emphasis mine going forward):
Since these functions are often used primarily as macros, their domain is restricted to the small positive integers representable in an unsigned char, plus the value of EOF. EOF is traditionally -1, but may be any negative integer, and hence distinguishable from any valid character code. These macros may thus be efficiently implemented by using the argument as an index into a small array of attributes.
So valid values are:
- Positive integers that can fit into unsigned char
EOF
which is some implementation defined negative number
Even though this is C99 rationale since the particular wording you are referring to does not change from C99 to C11 and so the rationale still fits.
We can also find why the interface uses int as an argument as opposed to char, from section 7.1.4
Use of library functions, it says:
All library prototypes are specified in terms of the “widened” types an argument formerly declared as char is now written as int. This ensures that most library functions can be called with or without a prototype in scope, thus maintaining backwards compatibility with pre-C89 code. Note, however, that since functions like printf and scanf use variable-length argument lists, they must be called in the scope of a prototype.
回答2:
What does representable in a type mean?
Re-formulated, a type is a convention for what the underlying bit-patterns mean. A value is thus representable in a type, if that type assigns some bit-pattern that meaning.
A conversion (which might need a cast), is a mapping from a value (represented with a specific type) to a value (possibly different) represented in the target type.
Under the given assumption (that char
is signed), CHAR_MIN
is certainly negative, and the text you quoted leaves no room for interpretation:
Yes, it is undefined behavior, as unsigned char
cannot represent any negative numbers.
If that assumption did not hold, your program would be well-defined, because CHAR_MIN
would be 0
, a valid value for unsigned char
.
Thus, we have a case where it is implementation-defined whether the program is undefined or well-defined.
As an aside, there is no guarantee that sizeof(int)>1
or INT_MAX >= CHAR_MAX
, so int
might not be able to represent all values possible for unsigned char
.
As conversions are defined to be value-preserving, a signed char
can always be converted to int
.
But if it was negative, that does not change the impossibility of representing a negative value as an unsigned char
. (The conversion is defined, as conversion from any integral type to any unsigned
integral type is always defined, though narrowing conversions need a cast.)
回答3:
The revealing quote (for me) is §6.3.1.3/1:
if the value can be represented by the new type, it is unchanged.
i.e., if the value has to be changed then the value can't be represented by the new type.
Therefore an unsigned
type can't represent a negative value.
To answer the question in the title: "representable" refers to "can be represented" from §6.3.1.3 and unrelated to "object representation" from §6.2.6.1.
It seems trivial in retrospect. I might have been confused by the habit of treating b'\xFF'
, 0xff
, 255
, -1
as the same byte in Python:
>>> (255).to_bytes(1, 'big')
b'\xff'
>>> int.from_bytes(b'\xFF', 'big')
255
>>> 255 == 0xff
True
>>> (-1).to_bytes(1, 'big', signed=True)
b'\xff'
and the disbelief that it is an undefined behavior to pass a character to a character classification function e.g., isspace(CHAR_MIN)
.
来源:https://stackoverflow.com/questions/25776824/what-does-representable-mean-in-c11