Overflow issues when implementing math formulas

问题

I heard that, when computing mean value, start+(end-start)/2 differs from (start+end)/2 because the latter can cause overflow. I do not quite understand why this second one can cause overflow while the first one does not. What are the generic rule to implement a math formula that can avoid overflow.

回答1:

Suppose you are using a computer where the maximum integer value is 10 and you want to compute the average of 5 and 7.

The first method (begin + (end-begin)/2) gives

5 + (7-5)/2 == 5 + 2/2 == 6

The second method (begin + end)/2 gives an overflow, since the intermediate 12 value is over the maximum value of 10 that we accept and "wraps over" to something else (if you are using unsigned numbers its usual to wrap back to zero but if your numbers are signed you could get a negative number!).

12/2 => overflow occurs => 2/2 == 1

Of course, in real computers integers overflow at a large value like 2^32 instead of 10, but the idea is the same. Unfortunately, there is no "general" way to get rid of overflow that I know of, and it greatly depends on what particular algorithm you are using. And event then, things get more complicated. You can get different behaviour depending on what number type you are using under the hood and there are other kinds of numerical errors to worry about in addition to over and underflow.

回答2:

Both your formulas will overflow, but under different circumstances:

The (start+end) part of your (start+end)/2 formula will overflow when start and end are both close to the integer limit on the same side of the range (i.e. both positive or both negative).
The (end-start) part of your start+(end-start)/2 formula will overflow when start is positive and end is negative, and both values are close to the respective ends of the representable integer values.

There are no "generic" rules, you do it case-by-case: look at parts of your formula, think of situations that could cause overflow, and come up with ways to avoid it. For example, the start+(end-start)/2 formula can be shown to avoid overflow when you average values with the same sign.

This is the hard way; the easy way is to use higher-capacity representations for intermediate results. For example, if you use long long instead of int to do intermediate calculations and copy the results back to int only when you are done, you will avoid overflow assuming that the end result fits in an int.

回答3:

When dealing with integers you probably care about the integer overflow when adopting such strategies.

Note that using the formula b+(b-a)/2 you'd want to make sure that a <= b. Otherwise you could get the very same problem at the lower bound of the possible range of values. Think about a/2+b/2. However there are other drawbacks of this approach as well.

When dealing with floating point numbers there comes another problem, catastrophic cancellation. Due to the limited number of significant digits of the floating point representation, accuracy is lost when large numbers are added (even if this is just an intermediate step).

To address this issue of numerical stability, e.g. this algorithm can be used (slightly adapted from wikipedia):

def online_mean(data):
  n = 0
  mean = 0

  for x in data:
    n = n + 1
    delta = x - mean
    mean = mean + delta/n

  return mean

I somehow felt there was a relationship to the formula you've presented above...

回答4:

In Binary Search, we will write the following code:

if(start > end){
   return;
}
int mid = start + (end - start) / 2;

By using start + (end - start) / 2, we can avoid the problems which are pointed by @dasblinkenlight

if we use (start + end) / 2, it will overflow as shown by dasblinkenlight

来源：https://stackoverflow.com/questions/10882368/overflow-issues-when-implementing-math-formulas

标签

c++

algorithm

numerical-methods

integer-overflow