Dealing with char values over 127 in C

醉酒当歌 提交于 2021-01-28 04:51:25

问题


I'm quite new to C programming, and I have some problems trying to assign a value over 127 (0x7F) in a char array. In my program, I work with generic binary data and I don't face any problem printing a previously acquired byte stream (e.g. with fopen or fgets, then processed with some bitwise operations) as %c or %d.
But if I try to print a character from its numerical value like this:

printf("%c\n", 128);

it just prints FFFD (the replacement character).
Here is another example:

char abc[] = {126, 128, '\0'}; // Manually assigning values
printf("%c", abc[0]); // Prints "~", as expected
printf("%c", 121); // Prints "y"
pritf("%c", abc[1]; // Should print "€", I think, but I get "�"

I'm a bit confused since I can just print every character below 128 in these ways.
The reason I'm asking this, is because I need to generate a (pseudo)random byte sequence using the rand() function.
Here is an example:

char abc[10];
srand(time(NULL));
abc[0] = rand() % 256; // Gives something between 00:FF ...
printf("%c", abc[0]); // ... but I get "�"

If this is of any help, the source code is encoded in UTF-8, but changing encoding doesn't have any effect.


回答1:


In C, a char is a different type than unsigned char and signed char. It has the range CHAR_MIN to CHAR_MAX. Yet it has the same range as one of unsigned char/signed char. Typically these are 8-bit types, but could be more. See CHAR_BIT. So the typical range is [0 to 255] or [-128 to 127]

If char is unsigned, abc[1] = 128 is fine. If char is signed, abc[1] = 128 is implementation-defined (see below). The typical I-D is the abc[1] will have the value of -128.

printf("%c\n", 128); will send the int value 128 to printf(). The "%c" will cast that value to an unsigned char. So far no problems. What appears on the output depends on how the output device handles code 128. Perhaps Ç, perhaps something else.

printf("%c", abc[1]; will send 128 or is I-D. If I-D and -128 was sent, then casting -128 to unsigned char is 128 and again the code for 128 is printed.

If the output device is expecting UTF8 sequences, a UTF8 sequence beginning with code 128 is invalid (it is an unexpected continuation byte) and many such systems will print the replacement character which is unicode FFFD.


Converting a value outside the range of of a signed char to char invokes:

the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3




回答2:


First of all, let me tell you, signed-ness of a char is implementation defined.

If you have to deal with char values over 127, you can use unsigned char. It can handle 0-255.

Also, you should be using %hhu format specifier to print the value of an unsigned char.




回答3:


If you're dealing with bytes, use unsigned char instead of char for your datatypes.

With regard to printing, you can print the bytes in hex instead of decimal or as characters:

printf("%02X", abc[0]);

You probably don't want to print these bytes as characters, as you'll most likely be dealing with UTF-8 character encoding which doesn't seem to be what you're looking for.



来源:https://stackoverflow.com/questions/35209652/dealing-with-char-values-over-127-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!