About the use of signed integers in C family of languages

后端 未结 5 1011
一整个雨季
一整个雨季 2021-01-11 17:57

When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.

When I\'m sure the valu

相关标签:
5条回答
  • 2021-01-11 18:25

    From the C FAQ:

    The first question in the C FAQ is which integer type should we decide to use?

    If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If well-defined overflow characteristics are important and negative values are not, or if you want to steer clear of sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned types.

    Another question concerns types conversions:

    If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.

    You can find it here. So basically using unsigned integers, mostly for arithmetic conversions can complicate the situation since you'll have to either make all your integers unsigned, or be at the risk of confusing the compiler and yourself, but as long as you know what you are doing, this is not really a risk per se. However, it could introduce simple bugs.

    And when it is a good to use unsigned integers? one situation is when using bitwise operations:

    The << operator shifts its first operand left by a number of bits given by its second operand, filling in new 0 bits at the right. Similarly, the >> operator shifts its first operand right. If the first operand is unsigned, >> fills in 0 bits from the left, but if the first operand is signed, >> might fill in 1 bits if the high-order bit was already 1. (Uncertainty like this is one reason why it's usually a good idea to use all unsigned operands when working with the bitwise operators.)

    taken from here And I've seen this somewhere:

    If it was best to use unsigned integers for values that are never negative, we would have started by using unsigned int in the main function int main(int argc, char* argv[]). One thing is sure, argc is never negative.

    EDIT:

    As mentioned in the comments, the signature of main is due to historical reasons and apparently it predates the existence of the unsigned keyword.

    0 讨论(0)
  • 2021-01-11 18:27
    • a signed return value might yield more information (think error-numbers, 0 is sometimes a valid answer, -1 indicates error, see man read) ... which might be relevant especially for developers of libraries.

    • if you are worrying about the one extra bit you gain when using unsigned instead of signed then you are probably using the wrong type anyway. (also kind of "premature optimization" argument)

    • languages like python, ruby, jscript etc are doing just fine without signed vs unsigned. that might be an indicator ...

    0 讨论(0)
  • 2021-01-11 18:29

    Unsigned intgers are an artifact from the past. This is from the time, where processors could do unsigned arithmetic a little bit faster.

    This is a case of premature optimization which is considered evil.

    Actually, in 2005 when AMD introduced x86_64 (or AMD64, how it was then called), the 64 bit architecture for x86, they brought the ghosts of the past back: If a signed integer is used as an index and the compiler can not prove that it is never negative, is has to insert a 32 to 64 bit sign extension instruction - because the default 32 to 64 bit extension is unsigned (the upper half of a 64 bit register gets cleard if you move a 32 bit value into it).

    But I would recommend against using unsigned in any arithmetic at all, being it pointer arithmetic or just simple numbers.

    for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}

    Any recent compiler will warn about such an construct, with condition ist always true or similar. With using a signed variable you avoid such pitfalls at all. Instead use ptrdiff_t.

    A problem might be the c++ library, it often uses an unsigned type for size_t, which is required because of some rare corner cases with very large sizes (between 2^31 and 2^32) on 32 bit systems with certain boot switches ( /3GB windows).

    There are many more, comparisons between signed and unsigned come to my mind, where the signed value automagically gets promoted to a unsigned and thus becomes a huge positive number, when it has been a small negative before.

    One exception for using unsigned exists: For bit fields, flags, masks it is quite common. Usually it doesn't make sense at all to interpret the value of these variables as a magnitude, and the reader may deduce from the type that this variable is to be interpreted in bits.

    The result will never be a negative value (as the section number, by the way). So why use a signed integer for this?

    Because you might want to compare the return value to a signed value, which is actually negative. The comparison should return true in that case, but the C standard specifies that the signed get promoted to an unsigned in that case and you will get a false instead. I don't know about ObjectiveC though.

    0 讨论(0)
  • 2021-01-11 18:37

    When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.

    When I'm sure the value will never need to be negative, I then use an unsigned integer. And I have to say this happen most of the time.

    To carefully consider which type that is most suitable each time you declare a variable is very good practice! This means you are careful and professional. You should not only consider signedness, but also the potential max value that you expect this type to have.

    The reason why you shouldn't use signed types when they aren't needed have nothing to do with performance, but with type safety. There are lots of potential, subtle bugs that can be caused by signed types:

    • The various forms of implicit promotions that exist in C can cause your type to change signedness in unexpected and possibly dangerous ways. The integer promotion rule that is part of the usual arithmetic conversions, the lvalue conversion upon assignment, the default argument promotions used by for example VA lists, and so on.

    • When using any form of bitwise operators or similar hardware-related programming, signed types are dangerous and can easily cause various forms of undefined behavior.

    By declaring your integers unsigned, you automatically skip past a whole lot of the above dangers. Similarly, by declaring them as large as unsigned int or larger, you get rid of lots of dangers caused by the integer promotions.

    Both size and signedness are important when it comes to writing rugged, portable and safe code. This is the reason why you should always use the types from stdint.h and not the native, so-called "primitive data types" of C.


    So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?

    I don't really think it is because they don't care, nor because they are lazy, even though declaring everything int is sometimes referred to as "sloppy typing" - which means sloppily picked type more than it means too lazy to type.

    I rather believe it is because they lack deeper knowledge of the various things I mentioned above. There's a frightening amount of seasoned C programmers who don't know how implicit type promotions work in C, nor how signed types can cause poorly-defined behavior when used together with certain operators.

    This is actually a very frequent source of subtle bugs. Many programmers find themselves staring at a compiler warning or a peculiar bug, which they can make go away by adding a cast. But they don't understand why, they simply add the cast and move on.


    for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}

    To me, this is just bad design

    Indeed it is.

    Once upon a time, down-counting loops would yield more effective code, because the compiler pick add a "branch if zero" instruction instead of a "branch if larger/smaller/equal" instruction - the former is faster. But this was at a time when compilers were really dumb and I don't believe such micro-optimizations are relevant any longer.

    So there is rarely ever a reason to have a down-counting loop. Whoever made the argument probably just couldn't think outside the box. The example could have been rewritten as:

    for(unsigned int i=0; i<foo.Length(); i++)
    {
      unsigned int index = foo.Length() - i - 1;
      thing[index] = something;
    }
    

    This code should not have any impact on performance, but the loop itself turned a whole lot easier to read, while at the same time fixing the bug that your example had.

    As far as performance is concerned nowadays, one should probably spend the time pondering about which form of data access that is most ideal in terms of data cache use, rather than anything else.


    Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.

    That's a poor argument. Good API design uses a dedicated error type for error reporting, such as an enum.

    Instead of having some hobbyist-level API like

    int do_stuff (int a, int b); // returns -1 if a or b were invalid, otherwise the result
    

    you should have something like:

    err_t do_stuff (int32_t a, int32_t b, int32_t* result);
    
    // returns ERR_A is a is invalid, ERR_B if b is invalid, ERR_XXX if... and so on
    // the result is stored in [result], which is allocated by the caller
    // upon errors the contents of [result] remain untouched
    

    The API would then consistently reserve the return of every function for this error type.

    (And yes, many of the standard library functions abuse return types for error handling. This is because it contains lots of ancient functions from a time before good programming practice was invented, and they have been preserved the way they are for backwards-compatibility reasons. So just because you find a poorly-written function in the standard library, you shouldn't run off to write an equally poor function yourself.)


    Overall, it sounds like you know what you are doing and giving signedness some thought. That probably means that knowledge-wise, you are actually already ahead of the people who wrote those posts and guides you are referring to.

    The Google style guide for example, is questionable. Similar could be said about lots of other such coding standards that use "proof by authority". Just because it says Google, NASA or Linux kernel, people blindly swallow them no matter the quality of the actual contents. There are good things in those standards, but they also contain subjective opinions, speculations or blatant errors.

    Instead I would recommend referring to real professional coding standards instead, such as MISRA-C. It enforces lots of thought and care for things like signedness, type promotion and type size, where less detailed/less serious documents just skip past it.

    There is also CERT C, which isn't as detailed and careful as MISRA, but at least a sound, professional document (and more focused towards desktop/hosted development).

    0 讨论(0)
  • 2021-01-11 18:41

    There is one heavy-weight argument against widely unsigned integers:

    Premature optimization is the root of all evil.

    We all have at least on one occasion been bitten by unsigned integers. Sometimes like in your loop, sometimes in other contexts. Unsigned integers add a hazard, even though a small one, to your program. And you are introducing this hazard to change the meaning of one bit. One little, tiny, insignificant-but-for-its-sign-meaning bit. On the other hand, the integers we work with in bread and butter applications are often far below the range of integers, more in the order of 10^1 than 10^7. Thus, the different range of unsigned integers is in the vast majority of cases not needed. And when it's needed, it is quite likely that this extra bit won't cut it (when 31 is too little, 32 is rarely enough) and you'll need a wider or an arbitrary-wide integer anyway. The pragmatic approach in these cases is to just use the signed integer and spare yourself the occasional underflow bug. Your time as a programmer can be put to much better use.

    0 讨论(0)
提交回复
热议问题