问题
Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf
format for a 2 byte signed number is %hd
, for a 4 byte signed number is %d
, for an 8 byte signed number is %ld
, but what is the correct format for a 1 byte signed number?
回答1:
what is the correct format for a 1 byte signed number?
%hh
and the integer conversion specifier of your choice (for example, %02hhX
. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following
d
,i
,o
,u
,x
, orX
conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf
), the function never sees a char
argument. Many programmers think that that means that it is unnecessary to use h
and hh
qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char
may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X
) will present you with the sign-extended F
s. So if you want to display signed char
using an unsigned format, you need to tell printf
what the original unpromoted width of the integer type was, using hh
.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X
format specification requires an unsigned argument, whereas the char
argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char
(and short
) integer types, the expression in the argument list is promoted to int
or unsigned int
before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int
; promotion of an unsigned char
(or an unsigned char
) to an unsigned int
will only happen on an implementation where sizeof(int) == 1
.)
So on most architectures, the argument to an %hx
or an %hhx
format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf
(and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int
argument, and a well-defined procedure for converting that value to an unsigned char
. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf
given a format specifier %c
is defined as follows (§7.21.6.8/p8), emphasis added:
the
int
argument is converted to anunsigned char
, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf
), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg
macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c
with char
arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)
来源:https://stackoverflow.com/questions/28387867/printf-format-for-1-byte-signed-number