Type conversion - unsigned to signed int/char

試著忘記壹切 提交于 2019-11-26 00:56:02

This is because of the various implicit type conversion rules in C. There are two of them that a C programmer must know: the usual arithmetic conversions and the integer promotions (the latter are part of the former).

In the char case you have the types (signed char) == (unsigned char). These are both small integer types. Other such small integer types are bool and short. The integer promotion rules state that whenever a small integer type is an operand of an operation, its type will get promoted to int, which is signed. This will happen no matter if the type was signed or unsigned.

In the case of the signed char, the sign will be preserved and it will be promoted to an int containing the value -5. In the case of the unsigned char, it contains a value which is 251 (0xFB ). It will be promoted to an int containing that same value. You end up with

if( (int)-5 == (int)251 )

In the integer case you have the types (signed int) == (unsigned int). They are not small integer types, so the integer promotions do not apply. Instead, they are balanced by the usual arithmetic conversions, which state that if two operands have the same "rank" (size) but different signedness, the signed operand is converted to the same type as the unsigned one. You end up with

if( (unsigned int)-5 == (unsigned int)-5)

Cool question!

The int comparison works, because both ints contain exactly the same bits, so they are essentially the same. But what about the chars?

Ah, C implicitly promotes chars to ints on various occasions. This is one of them. Your code says if(a==b), but what the compiler actually turns that to is:

if((int)a==(int)b) 

(int)a is -5, but (int)b is 251. Those are definitely not the same.

EDIT: As @Carbonic-Acid pointed out, (int)b is 251 only if a char is 8 bits long. If int is 32 bits long, (int)b is -32764.

REDIT: There's a whole bunch of comments discussing the nature of the answer if a byte is not 8 bits long. The only difference in this case is that (int)b is not 251 but a different positive number, which isn't -5. This is not really relevant to the question which is still very cool.

Welcome to integer promotion. If I may quote from the website:

If an int can represent all values of the original type, the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

C can be really confusing when you do comparisons such as these, I recently puzzled some of my non-C programming friends with the following tease:

#include <stdio.h>
#include <string.h>

int main()
{
    char* string = "One looooooooooong string";

    printf("%d\n", strlen(string));

    if (strlen(string) < -1) printf("This cannot be happening :(");

    return 0;
}

Which indeed does print This cannot be happening :( and seemingly demonstrates that 25 is smaller than -1!

What happens underneath however is that -1 is represented as an unsigned integer which due to the underlying bits representation is equal to 4294967295 on a 32 bit system. And naturally 25 is smaller than 4294967295.

If we however explicitly cast the size_t type returned by strlen as a signed integer:

if ((int)(strlen(string)) < -1)

Then it will compare 25 against -1 and all will be well with the world.

A good compiler should warn you about the comparison between an unsigned and signed integer and yet it is still so easy to miss (especially if you don't enable warnings).

This is especially confusing for Java programmers as all primitive types there are signed. Here's what James Gosling (one of the creators of Java) had to say on the subject:

Gosling: For me as a language designer, which I don't really count myself as these days, what "simple" really ended up meaning was could I expect J. Random Developer to hold the spec in his head. That definition says that, for instance, Java isn't -- and in fact a lot of these languages end up with a lot of corner cases, things that nobody really understands. Quiz any C developer about unsigned, and pretty soon you discover that almost no C developers actually understand what goes on with unsigned, what unsigned arithmetic is. Things like that made C complex. The language part of Java is, I think, pretty simple. The libraries you have to look up.

The hex representation of -5 is:

  • 8-bit, two's complement signed char: 0xfb
  • 32-bit, two's complement signed int: 0xfffffffb

When you convert a signed number to an unsigned number, or vice versa, the compiler does ... precisely nothing. What is there to do? The number is either convertible or it isn't, in which case undefined or implementation-defined behaviour follows (I've not actually checked which) and the most efficient implementation-defined behaviour is to do nothing.

So, the hex representation of (unsigned <type>)-5 is:

  • 8-bit, unsigned char: 0xfb
  • 32-bit, unsigned int: 0xfffffffb

Look familiar? They're bit-for-bit the same as the signed versions.

When you write if (a == b), where a and b are of type char, what the compiler is actually required to read is if ((int)a == (int)b). (This is that "integer promotion" that everyone else is banging on about.)

So, what happens when we convert char to int?

  • 8-bit signed char to 32-bit signed int: 0xfb -> 0xfffffffb
    • Well, that makes sense because it matches the representations of -5 above!
    • It's called a "sign-extend", because it copies the top bit of the byte, the "sign-bit", leftwards into the new, wider value.
  • 8-bit unsigned char to 32-bit signed int: 0xfb -> 0x000000fb
    • This time it does a "zero-extend" because the source type is unsigned, so there is no sign-bit to copy.

So, a == b really does 0xfffffffb == 0x000000fb => no match!

And, c == d really does 0xfffffffb == 0xfffffffb => match!

Antonio

My point is: didn't you get a warning at compile time "comparing signed and unsigned expression"?

The compiler is trying to inform you that he is entitled to do crazy stuff! :) I would add, crazy stuff will happen using big values, close to the capacity of the primitive type. And

 unsigned int d = -5;

is assigning definitely a big value to d, it's equivalent (even if, probably not guaranteed to be equivalent) to be:

 unsigned int d = UINT_MAX -4; ///Since -1 is UINT_MAX

Edit:

However, it is interesting to notice that only the second comparison gives a warning (check the code). So it means that the compiler applying the conversion rules is confident that there won't be errors in the comparison between unsigned char and char (during comparison they will be converted to a type that can safely represent all its possible values). And he is right on this point. Then, it informs you that this won't be the case for unsigned int and int: during the comparison one of the 2 will be converted to a type that cannot fully represent it.

For completeness, I checked it also for short: the compiler behaves in the same way than for chars, and, as expected, there are no errors at runtime.

.

Related to this topic, I recently asked this question (yet, C++ oriented).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!