Why are unsigned integers error prone?

寵の児 提交于 2019-11-28 20:00:51

One possible aspect is that unsigned integers can lead to somewhat hard-to-spot problems in loops, because the underflow leads to large numbers. I cannot count (even with an unsigned integer!) how many times I made a variant of this bug

for(size_t i = foo.size(); i >= 0; --i)
    ...

Note that, by definition, i >= 0 is always true. (What causes this in the first place is that if i is signed, the compiler will warn about a possible overflow with the size_t of size()).

There are other reasons mentioned Danger – unsigned types used here!, the strongest of which, in my opinion, is the implicit type conversion between signed and unsigned.

Baum mit Augen

One big factor is that it makes loop logic harder: Imagine you want to iterate over all but the last element of an array (which does happen in the real world). So you write your function:

void fun (const std::vector<int> &vec) {
    for (std::size_t i = 0; i < vec.size() - 1; ++i)
        do_something(vec[i]);
}

Looks good, doesn't it? It even compiles cleanly with very high warning levels! (Live) So you put this in your code, all tests run smoothly and you forget about it.

Now, later on, somebody comes along an passes an empty vector to your function. Now with a signed integer, you hopefully would have noticed the sign-compare compiler warning, introduced the appropriate cast and not have published the buggy code in the first place.

But in your implementation with the unsigned integer, you wrap and the loop condition becomes i < SIZE_T_MAX. Disaster, UB and most likely crash!

I want to know how they lead to security bugs?

This is also a security problem, in particular it is a buffer overflow. One way to possibly exploit this would be if do_something would do something that can be observed by the attacker. He might be able to find what input went into do_something, and that way data the attacker should not be able to access would be leaked from your memory. This would be a scenario similar to the Heartbleed bug. (Thanks to ratchet freak for pointing that out in his comment.)

I'm not going to watch a video just to answer a question, but one issue is the confusing conversions which can happen if you mix signed and unsigned values. For example:

#include <iostream>

int main() {
    unsigned n = 42;
    int i = -42;
    if (i < n) {
        std::cout << "All is well\n";
    } else {
        std::cout << "ARITHMETIC IS BROKEN!\n";
    }
}

The promotion rules mean that i is converted to unsigned for the comparison, giving a large positive number and a surprising result.

Although it may only be considered as a variant of the existing answers: Referring to "Signed and unsigned types in interfaces," C++ Report, September 1995 by Scott Meyers, it's particularly important to avoid unsigned types in interfaces.

The problem is that it becomes impossible to detect certain errors that clients of the interface could make (and if they could make them, they will make them).

The example given there is:

template <class T>
  class Array {
  public:
      Array(unsigned int size);
  ...

and a possible instantiation of this class

int f(); // f and g are functions that return
int g(); // ints; what they do is unimportant
Array<double> a(f()-g()); // array size is f()-g()

The difference of the values returned by f() and g() might be negative, for an awful number of reasons. The constructor of the Array class will receive this difference as a value that is implicitly converted to be unsigned. Thus, as the implementor of the Array class, one can not distinguish between an erreonously passed value of -1, and a very large array allocation.

The big problem with unsigned int is that if you subtract 1 from an unsigned int 0, the result isn't a negative number, the result isn't less than the number you started with, but the result is the largest possible unsigned int value.

unsigned int x = 0;
unsigned int y = x - 1;

if (y > x) printf ("What a surprise! \n");

And this is what makes unsigned int error prone. Of course unsigned int works exactly as it is designed to work. It's absolutely safe if you know what you are doing and make no mistakes. But most people make mistakes.

If you are using a good compiler, you turn on all the warnings that the compiler produces, and it will tell you when you do dangerous things that are likely to be mistakes.

The problem with unsigned integer types is that depending upon their size they may represent one of two different things:

  1. Unsigned types smaller than int (e.g. uint8) hold numbers in the range 0..2ⁿ-1, and calculations with them will behave according to the rules of integer arithmetic provided they don't exceed the range of the int type. Under present rules, if such a calculation exceeds the range of an int, a compiler is allowed to do anything it likes with the code, even going so far as to negate the laws of time and causality (some compilers will do precisely that!), and even if the result of the calculation would be assigned back to an unsigned type smaller than int.
  2. Unsigned types unsigned int and larger hold members of the abstract wrapping algebraic ring of integers congruent mod 2ⁿ; this effectively means that if a calculation goes outside the range 0..2ⁿ-1, the system will add or subtract whatever multiple of 2ⁿ would be required to get the value back in range.

Consequently, given uint32_t x=1, y=2; the expression x-y may have one of two meanings depending upon whether int is larger than 32 bits.

  1. If int is larger than 32 bits, the expression will subtract the number 2 from the number 1, yielding the number -1. Note that while a variable of type uint32_t can't hold the value -1 regardless of the size of int, and storing either -1 would cause such a variable to hold 0xFFFFFFFF, but unless or until the value is coerced to an unsigned type it will behave like the signed quantity -1.
  2. If int is 32 bits or smaller, the expression will yield a uint32_t value which, when added to the uint32_t value 2, will yield the uint32_t value 1 (i.e. the uint32_t value 0xFFFFFFFF).

IMHO, this problem could be solved cleanly if C and C++ were to define new unsigned types [e.g. unum32_t and uwrap32_t] such that a unum32_t would always behave as a number, regardless of the size of int (possibly requiring the right-hand operation of a subtraction or unary minus to be promoted to the next larger signed type if int is 32 bits or smaller), while a wrap32_t would always behave as a member of an algebraic ring (blocking promotions even if int were larger than 32 bits). In the absence of such types, however, it's often impossible to write code which is both portable and clean, since portable code will often require type coercions all over the place.

Numeric conversion rules in C and C++ are a byzantine mess. Using unsigned types exposes yourself to that mess to a much greater extent than using purely signed types.

Take for example the simple case of a comparison between two variables, one signed and the other unsigned.

  • If both operands are smaller than int then they will both be converted to int and the comparison will give numerically correct results.
  • If the unsigned operand is smaller than the signed operand then both will be converted to the type of the signed operand and the comparison will give numerically correct results.
  • If the unsigned operand is greater than or equal in size to the signed operand and also greater than or equal in size to int then both will be converted to the type of the unsigned operand. If the value of the signed operand is less than zero this will lead to numerically incorrect results.

To take another example consider multiplying two unsigned integers of the same size.

  • If the operand size is greater than or equal to the size of int then the multiplication will have defined wraparound semantics.
  • If the operand size is smaller than int but greater than or equal to half the size of int then there is the potential for undefined behaviour.
  • If the operand size is less than half the size of int then the multiplication will produce numerically correct results. Assigning this result back to a variable of the original unsigned type will produce defined wraparound semantics.

In addition to range/warp issue with unsigned types. Using mix of unsigned and signed integer types impact significant performance issue for processor. Less then floating point cast, but quite a lot to ignore that. Additionally compiler may place range check for the value and change the behavior of further checks.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!