I tried the to execute the below program:
#include <stdio.h>
int main() {
signed char a = -5;
unsigned char b = -5;
int c = -5;
unsigned int d = -5;
if (a == b)
printf("\r\n char is SAME!!!");
else
printf("\r\n char is DIFF!!!");
if (c == d)
printf("\r\n int is SAME!!!");
else
printf("\r\n int is DIFF!!!");
return 0;
}
For this program, I am getting the output:
char is DIFF!!! int is SAME!!!
Why are we getting different outputs for both?
Should the output be as below ?
char is SAME!!! int is SAME!!!
A codepad link.
This is because of the various implicit type conversion rules in C. There are two of them that a C programmer must know: the usual arithmetic conversions and the integer promotions (the latter are part of the former).
In the char case you have the types (signed char) == (unsigned char)
. These are both small integer types. Other such small integer types are bool
and short
. The integer promotion rules state that whenever a small integer type is an operand of an operation, its type will get promoted to int
, which is signed. This will happen no matter if the type was signed or unsigned.
In the case of the signed char
, the sign will be preserved and it will be promoted to an int
containing the value -5. In the case of the unsigned char
, it contains a value which is 251 (0xFB ). It will be promoted to an int
containing that same value. You end up with
if( (int)-5 == (int)251 )
In the integer case you have the types (signed int) == (unsigned int)
. They are not small integer types, so the integer promotions do not apply. Instead, they are balanced by the usual arithmetic conversions, which state that if two operands have the same "rank" (size) but different signedness, the signed operand is converted to the same type as the unsigned one. You end up with
if( (unsigned int)-5 == (unsigned int)-5)
Cool question!
The int
comparison works, because both ints contain exactly the same bits, so they are essentially the same. But what about the char
s?
Ah, C implicitly promotes char
s to int
s on various occasions. This is one of them. Your code says if(a==b)
, but what the compiler actually turns that to is:
if((int)a==(int)b)
(int)a
is -5, but (int)b
is 251. Those are definitely not the same.
EDIT: As @Carbonic-Acid pointed out, (int)b
is 251 only if a char
is 8 bits long. If int
is 32 bits long, (int)b
is -32764.
REDIT: There's a whole bunch of comments discussing the nature of the answer if a byte is not 8 bits long. The only difference in this case is that (int)b
is not 251 but a different positive number, which isn't -5. This is not really relevant to the question which is still very cool.
Welcome to integer promotion. If I may quote from the website:
If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.
C can be really confusing when you do comparisons such as these, I recently puzzled some of my non-C programming friends with the following tease:
#include <stdio.h>
#include <string.h>
int main()
{
char* string = "One looooooooooong string";
printf("%d\n", strlen(string));
if (strlen(string) < -1) printf("This cannot be happening :(");
return 0;
}
Which indeed does print This cannot be happening :(
and seemingly demonstrates that 25 is smaller than -1!
What happens underneath however is that -1 is represented as an unsigned integer which due to the underlying bits representation is equal to 4294967295 on a 32 bit system. And naturally 25 is smaller than 4294967295.
If we however explicitly cast the size_t
type returned by strlen
as a signed integer:
if ((int)(strlen(string)) < -1)
Then it will compare 25 against -1 and all will be well with the world.
A good compiler should warn you about the comparison between an unsigned and signed integer and yet it is still so easy to miss (especially if you don't enable warnings).
This is especially confusing for Java programmers as all primitive types there are signed. Here's what James Gosling (one of the creators of Java) had to say on the subject:
Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.
The hex representation of -5
is:
- 8-bit, two's complement
signed char
:0xfb
- 32-bit, two's complement
signed int
:0xfffffffb
When you convert a signed number to an unsigned number, or vice versa, the compiler does ... precisely nothing. What is there to do? The number is either convertible or it isn't, in which case undefined or implementation-defined behaviour follows (I've not actually checked which) and the most efficient implementation-defined behaviour is to do nothing.
So, the hex representation of (unsigned <type>)-5
is:
- 8-bit,
unsigned char
:0xfb
- 32-bit,
unsigned int
:0xfffffffb
Look familiar? They're bit-for-bit the same as the signed versions.
When you write if (a == b)
, where a
and b
are of type char
, what the compiler is actually required to read is if ((int)a == (int)b)
. (This is that "integer promotion" that everyone else is banging on about.)
So, what happens when we convert char
to int
?
- 8-bit
signed char
to 32-bitsigned int
:0xfb
->0xfffffffb
- Well, that makes sense because it matches the representations of
-5
above! - It's called a "sign-extend", because it copies the top bit of the byte, the "sign-bit", leftwards into the new, wider value.
- Well, that makes sense because it matches the representations of
- 8-bit
unsigned char
to 32-bitsigned int
:0xfb
->0x000000fb
- This time it does a "zero-extend" because the source type is unsigned, so there is no sign-bit to copy.
So, a == b
really does 0xfffffffb == 0x000000fb
=> no match!
And, c == d
really does 0xfffffffb == 0xfffffffb
=> match!
My point is: didn't you get a warning at compile time "comparing signed and unsigned expression"?
The compiler is trying to inform you that he is entitled to do crazy stuff! :) I would add, crazy stuff will happen using big values, close to the capacity of the primitive type. And
unsigned int d = -5;
is assigning definitely a big value to d, it's equivalent (even if, probably not guaranteed to be equivalent) to be:
unsigned int d = UINT_MAX -4; ///Since -1 is UINT_MAX
Edit:
However, it is interesting to notice that only the second comparison gives a warning (check the code). So it means that the compiler applying the conversion rules is confident that there won't be errors in the comparison between unsigned char
and char
(during comparison they will be converted to a type that can safely represent all its possible values). And he is right on this point. Then, it informs you that this won't be the case for unsigned int
and int
: during the comparison one of the 2 will be converted to a type that cannot fully represent it.
For completeness, I checked it also for short: the compiler behaves in the same way than for chars, and, as expected, there are no errors at runtime.
.
Related to this topic, I recently asked this question (yet, C++ oriented).
来源:https://stackoverflow.com/questions/17312545/type-conversion-unsigned-to-signed-int-char