问题
I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.
But if I try to print a character from its numerical value like this:
printf("%c\n", 128);
it just prints FFFD (the replacement character).
Here is another example:
char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"
I'm a bit confused since I can just print every character below 128 in these ways.
The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.
Here is an example:
char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"
If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.
回答1:
In C, a char
is a different type than unsigned char
and signed char
. It has the range CHAR_MIN
to CHAR_MAX
. Yet it has the same range as one of unsigned char
/signed char
. Typically these are 8-bit types, but could be more. See CHAR_BIT
. So the typical range is [0 to 255]
or [-128 to 127]
If char
is unsigned, abc[1] = 128
is fine. If char
is signed, abc[1] = 128
is implementation-defined (see below). The typical I-D is the abc[1]
will have the value of -128
.
printf("%c\n", 128);
will send the int
value 128 to printf()
. The "%c"
will cast that value to an unsigned char
. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç
, perhaps something else.
printf("%c", abc[1];
will send 128 or is I-D. If I-D and -128
was sent, then casting -128 to unsigned char
is 128 and again the code for 128 is printed.
If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.
Converting a value outside the range of of a signed char
to char
invokes:
the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3
回答2:
First of all, let me tell you, signed-ness of a char
is implementation defined.
If you have to deal with char
values over 127, you can use unsigned char
. It can handle 0-255.
Also, you should be using %hhu
format specifier to print the value of an unsigned char
.
回答3:
If you're dealing with bytes, use unsigned char
instead of char
for your datatypes.
With regard to printing, you can print the bytes in hex instead of decimal or as characters:
printf("%02X", abc[0]);
You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.
来源:https://stackoverflow.com/questions/35209652/dealing-with-char-values-over-127-in-c