What is the right way to find the average of two values?

问题

I recently learned that integer overflow is an undefined behavior in C (side question - is it also UB in C++?)

Often in C programming you need to find the average of two values a and b. However doing (a+b)/2 can result in overflow and undefined behavior.

So my question is - what is the right way to find the average of two values a and b in C?

回答1:

With help from Secure Coding

if (((si_b > 0) && (si_a > (INT_MAX - si_b))) ||
    ((si_b < 0) && (si_a < (INT_MIN - si_b))))
{
  /* will overflow, so use difference method */
  return si_b + (si_a - si_b) / 2;
} 
else
{
 /* the addition will not overflow */
  return (si_a + si_b) / 2;
}

ADDENDUM

Thanks to @chux for pointing out the rounding problem. Here's a version that's tested for correct rounding...

int avgnoov (int si_a, int si_b)
{
    if ((si_b > 0) && (si_a > (INT_MAX - si_b)))
    {
      /* will overflow, so use difference method */
      /* both si_a and si_b > 0; 
          we want difference also > 0
          so rounding works correctly */
      if (si_a >= si_b)
        return si_b + (si_a - si_b) / 2;
      else
        return si_a + (si_b - si_a) / 2;
    } 
    else if ((si_b < 0) && (si_a < (INT_MIN - si_b)))
    {
      /* will overflow, so use difference method */
      /* both si_a and si_b < 0; 
          we want difference also < 0
          so rounding works correctly */
      if (si_a <= si_b)
        return si_b + (si_a - si_b) / 2;
      else
        return si_a + (si_b - si_a) / 2;
    }
    else
    {
     /* the addition will not overflow */
      return (si_a + si_b) / 2;
    }
}

回答2:

(a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)

The shift statement (x >> i) in c int mathematics is equivalent to a division by 2 to the power of i. So the statement (a >> 1) + (b >> 1) is the same as a/2 + b/2. However the mean of the truncated parts of the number need to be added as well. This value can be obtained by masking (a & 1), adding ((a & 1) + (b & 1)) and dividing (((a & 1) + (b & 1)) >> 1). The mean becomes (a >> 1) + (b >> 1) + (((a & 1) + (b & 1)) >> 1)

Note: the reason to use >> and & rather than / and % as the division and remainder operators is one of efficiency.

回答3:

A simple approach is the following

int c = a / 2 + ( b + a % 2 ) / 2;

For example a and b can be represented as

a = 2 * n + r1;
b = 2 * m + r2;

Then

( a + b ) / 2 => ( 2 * n + r1 + 2 * m + r2 ) / 2 => 2 * n / 2 + ( b + r1 ) / 2

And the last expression gives you

=> a / 2 + ( b + a % 2 ) / 2

The more correct expression is the following

int c = a / 2 + b / 2 + ( a % 2 + b % 2 ) / 2;

For example if we have

int a = INT_MAX;
int b = INT_MAX;

then c calculated as

int c = a / 2 + b / 2 + ( a % 2 + b % 2 ) / 2;

will give c == INT_MAX

EDIT: there was found interesting difference between the effect of computer operators and the effect of mathematical operators. For example according to the mathematics -1 can be represented as

-1 = -1 * 2 + 1

that is according to the formula

a = 2 * n + r1

2 * n shall be an integer number less than or equal tp a

So the number that is less -1 is -2. :)

I think that the general formula shown by me would work it is required that for odd negative numbers there would be considered even negative numbers that less than the odd negative number.

it seems that the correct formula looks as

int c = ( a < 0 ? a & ~1 : a ) / 2 + 
        ( b < 0 ? b & ~1 : b ) / 2 + 
        ( ( a & 1 ) + ( b & 1 ) ) / 2;

It is important to note that from the mathematical point of view the average of -1 and -2 shall be equal to -2 and the formula gives the correct result.:)

回答4:

If you are concerned about overflow, you could cast the values to a larger type to perform the math, and then do the bounds checking.

回答5:

This is from Calculating the average of two integer numbers rounded towards zero in a single instruction cycle:

(a >> 1) + (b >> 1) + (a & b & 0x1)

You must consider that:

it's implementation defined whether right shifting a negative integer shifts zeros or ones into the high order bits. Many CPUs often have two different instructions: an arithmetic shift right (preserves the sign bit) and a logical shift right (doesn't preserve the sign bit). The compiler is allowed to choose either (most compilers choose an arithmetic shift instruction).

ISO/IEC 9899:2011 §6.5.7 Bitwise shift operators

¶5 The result of E1 >> E2is E1 right-shifted E2 bit positions. [CUT] If E1 has a signed type and a negative value, the resulting value is implementation-defined.

Changing the expression to:
```
a / 2 + b / 2 + (a & b & 0x1)
```
isn't a solution since logical right shifts are equivalent to division by a power of 2 only for positive or unsigned numbers.
also (a & b & 0x1) isn't well defined. This term should be non-zero when both a and b are odd. But it fails with one's complement representation and ISO C, section 6.2.6.2/2, states that an implementation can choose one of three different representations for integral data types:
- two's complement
- one's complement
- sign/magnitude
(usually the two's complement far outweigh the others).

回答6:

The simplest (and usually fastest) way to average two int over the entire range [INT_MIN...INT_MAX] is to resort to a wider integer type. (Suggested by @user3100381.) Let us call that int2x.

int average_int(int a, int b) {
  return ((int2x) a + b)/2;
}

Of course this obliges a wider type to exist - so let's look at a solution that does not require a wider type.

Challenges:

Q: When one int is odd and the other is even, which way should rounding occur?
A: Follow average_int() above and round toward 0. (truncate).

Q: Can code use %?
A: With pre-C99 code, the result of a % 2 allows for different results when a < 0. So let us not use %.

Q: Does int need to have about symmetric range of positive and negative numbers?
A: Since C99 the number of negative numbers is the the same (or 1 more) than the number of positive numbers. Let us try not to require this.

SOLUTION:

Perform tests to determine is if overflow may occur. If not, simple use (a + b) / 2. Otherwise, add half the difference (signed same as answer) to the smaller value.

The following gives the same answer as average_int() without resorting to a wider integer type. It protects against int overflow and does not require INT_MIN + INT_MAX to be 0 or -1. It does not depends on encoding to be 2's complement, 1's complement or sign-magnitude.

int avgC2(int a, int b) {
  if (a >= 0) {
    if (b > (INT_MAX - a)) {
      // (a+b) > INT_MAX
      if (a >= b) {
        return (a - b) / 2 + b;
      } else {
        return (b - a) / 2 + a;
      }
    }
  } else {
    if (b < (INT_MIN - a)) {
      // (a+b) < INT_MIN
      if (a <= b) {
        return (a - b) / 2 + b;
      } else {
        return (b - a) / 2 + a;
      }
    }
  }
  return (a + b) / 2;
}

At most, 3 if()s occur with any pair of int.

回答7:

If you only have to deal with unsigned integer types (and can think in binary), you can decompose your addition into digit and carry. We can write a+b (in unlimited precision) as (a^b) + ((a&b)<<1)), so (a+b)/2 is simply ((a^b)>>1) + (a&b). This last expression fits within the common type of a and b, so you can use it in your code:

unsigned semisum(unsigned a, unsigned b)
{
    return ((a^b)>>1) + (a&b);
}

回答8:

Simplest answer if there is only 2 elements to avoid overflow would be:

(a/2) + (b/2) = average

For more elements, you could use:

(a/x) + (b/x) + (c/x) + (d/x) ..... = average //x = amount of elements

From a mathematical point of view, this will never reach an overflow if none of the original values has done so previously, as you do not really add them all together, but rather dividing them before adding them together. Thus no result of any operation performed during the calculation, including the result, will ever be larger (to either side of 0) than the largest initial element (assuming you only work with Real Numbers).

So do the following:

Determine the amount of elements in 'C', let's call it total.
Declare a value to store the average, let's call it average.
Declare a value to store the remainders, let's call it remainder.
Iterate through them and:
- Divide the current element by the total amount, total.
- Add the result to the average, average.
- Add the remainder of the divided values together, remainder.
Divide the remainders as well and add it to the average, average.
Do with the average what you need/intend to.

This will give you an answer off by a maximum of 1 (Decimal numeric system [base 10]). I don't know C++ yet, so I can only give you an example in C#.

Pseudo code in C# (just to provide an idea):

int[] C = new int[20];            //The array of elements.
int total = C.Length;             //The total amount of elements.
int average = 0;                  //The variable to contain the result.
int remainder = 0;                //The variable to contain all the smaller bits.
foreach (int element in C)        //Iteration
{
    int temp = (element / total); //Divide the current element by the total.
    average = average + temp;     //Add the result to the average.
    temp = (temp % total);        //Get the remainder (number that was not divided)
    remainder = remainder + temp; //Add remainder to the other remainders.
}
average = average + (remainder / total); // Adds the divided remainders to the average.

Compressed C# example:

int[] C = new int[20];               //The array of elements.
int total = C.Length;                //The total amount of elements.
int average = 0;                     //The variable to contain the result.
int remainder = 0;                   //The variable to contain all the smaller bits.
foreach (int element in C)           //Iteration
{
    average += (element / total);    //Add the current element divided by the total to the average.
    remainder += ( element % total); //Add the remainders together.
}
average += (remainder / total); //Adds the divided remainders to the total.

来源：https://stackoverflow.com/questions/24920503/what-is-the-right-way-to-find-the-average-of-two-values

标签

undefined-behavior

integer-overflow