问题
Can I pass a negative int in printf
while printing through format specifier %c
since while printing int
gets converted into an unsigned char? Is printf("%c", -65);
valid? — I tried it on GCC but getting a diamond-like character(with question-mark inside) as output. Why?
回答1:
Absolutely yes, if char
is a signed type. C allows char to be signed or unsigned and in GCC you can switch between them with -funsigned-char and -fsigned-char. When char is signed it's exactly the same thing as this
char c = -65;
printf("%c", c);
When passing to printf the char
variable will be sign-extended to int
so printf will also see -65 like if it's passed from a constant. printf
simply has no way to differentiate between printf("%c", c);
and printf("%c", -65);
due to default promotion in variadic functions.
The printing result depends on the character encoding though. For example in the ISO-8859-1 or Windows-1252 charsets you'll see ¿
because (unsigned char)-65 == 0xBF
. In UTF-8 (which is a variable-length encoding) 0xBF is not allowed as a character in the starting position. That's why you see � which is the replacement character for invalid bytes
Please tell me why the code point 0 to 255 are not mapped to 0 to 255 in unsigned char. I mean that they are non-negative so shouldn't I just look through the UTF-8 character set for their corresponding values?
The mapping is not done by relative position in the range as you thought, i.e. code point 0 maps to the CHAR_MIN
, code point 40 maps to CHAR_MIN + 40
, code point 255 maps to CHAR_MAX
... In two's complement systems it's typically a simple mapping based on the value of the bit pattern when treating as unsigned. That's because the way values are usually truncated from a wider type. In C a character literal like 'a'
has type int. Suppose 'a'
is mapped to code point 130 in some theoretical character set then the below lines are equivalent
char c = 'a';
char c = 130;
Either way c
will be assigned a value of 'a'
after casting to char, i.e. (char)'a'
, which may be a negative value
So code points 0 to 255 are mapped to 0 to 255 in unsigned char. That means code point code point 0x1F will be stored in a char (signed or unsigned) with value 0x1F. Code point 0xBF will be mapped to 0xBF if char is unsigned and -65 if char is signed
I'm assuming 8-bit char for all the above things. Also note that UTF-8 is an encoding for the Unicode character set, it's not a charset on its own so you can't look up UTF-8 code points
来源:https://stackoverflow.com/questions/61660739/can-c-be-given-a-negative-int-argument-in-printf