I need to compute an expression which looks like:
A*B - C*D
, where their types are: signed long long int A, B, C, D;
Each number can be really big (not
If the result fits in a long long int then the expression A*B-C*D is okay as it performs the arithmetic mod 2^64, and will give the correct result. The problem is to know if the result fits in a long long int. To detect this, you can use the following trick using doubles:
if( abs( (double)A*B - (double)C*D ) > MAX_LLONG )
Overflow
else
return A*B-C*D;
The problem with this approach is that you are limited by the precision of the mantissa of the doubles (54bits?) so you need to limit the products A*B and C*D to 63+54 bits (or probably a little less).
Note that this is not standard since it relies on wrap-around signed-overflow. (GCC has compiler flags which enable this.)
But if you just do all the calculations in long long
, the result of applying the formula directly:
(A * B - C * D)
will be accurate as long as the correct result fits into a long long
.
Here's a work-around that only relies on implementation-defined behavior of casting unsigned integer to signed integer. But this can be expected to work on almost every system today.
(long long)((unsigned long long)A * B - (unsigned long long)C * D)
This casts the inputs to unsigned long long
where the overflow behavior is guaranteed to be wrap-around by the standard. Casting back to a signed integer at the end is the implementation-defined part, but will work on nearly all environments today.
If you need more pedantic solution, I think you have to use "long arithmetic"
You could write each number in an array, each element being a digit and do the calculations as polynomials. Take the resulting polynomial, which is an array, and compute the result by multiplying each element of the array with 10 to the power of the position in the array (the first position being the largest and the last being zero).
The number 123
can be expressed as:
123 = 100 * 1 + 10 * 2 + 3
for which you just create an array [1 2 3]
.
You do this for all numbers A, B, C and D, and then you multiply them as polynomials. Once you have the resulting polynomial, you just reconstruct the number from it.
This seems too trivial I guess.
But A*B
is the one that could overflow.
You could do the following, without losing precision
A*B - C*D = A(D+E) - (A+F)D
= AD + AE - AD - DF
= AE - DF
^smaller quantities E & F
E = B - D (hence, far smaller than B)
F = C - A (hence, far smaller than C)
This decomposition can be done further.
As @Gian pointed out, care might need to be taken during the subtraction operation if the type is unsigned long long.
For example, with the case you have in the question, it takes just one iteration,
MAX * MAX - (MAX - 1) * (MAX + 1)
A B C D
E = B - D = -1
F = C - A = -1
AE - DF = {MAX * -1} - {(MAX + 1) * -1} = -MAX + MAX + 1 = 1
Choose K = a big number
(eg. K = A - sqrt(A)
)
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D); // Avoid overflow.
Why?
(A-K)*(B-K) = A*B - K*(A+B) + K^2
(C-K)*(D-K) = C*D - K*(C+D) + K^2
=>
(A-K)*(B-K) - (C-K)*(D-K) = A*B - K*(A+B) + K^2 - {C*D - K*(C+D) + K^2}
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B) + K*(C+D) + K^2 - K^2
(A-K)*(B-K) - (C-K)*(D-K) = A*B - C*D - K*(A+B-C-D)
=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A+B-C-D)
=>
A*B - C*D = (A-K)*(B-K) - (C-K)*(D-K) + K*(A-C+B-D)
Note that Because A, B, C and D are big numbers, thus A-C
and B-D
are small numbers.
While a signed long long int
will not hold A*B
, two of them will. So A*B
could be decomposed to tree terms of different exponent, any of them fitting one signed long long int
.
A1=A>>32;
A0=A & 0xffffffff;
B1=B>>32;
B0=B & 0xffffffff;
AB_0=A0*B0;
AB_1=A0*B1+A1*B0;
AB_2=A1*B1;
Same for C*D
.
Folowing the straight way, the subraction could be done to every pair of AB_i
and CD_i
likewise, using an additional carry bit (accurately a 1-bit integer) for each. So if we say E=A*B-C*D you get something like:
E_00=AB_0-CD_0
E_01=(AB_0 > CD_0) == (AB_0 - CD_0 < 0) ? 0 : 1 // carry bit if overflow
E_10=AB_1-CD_1
...
We continue by transferring the upper-half of E_10
to E_20
(shift by 32 and add, then erase upper half of E_10
).
Now you can get rid of the carry bit E_11
by adding it with the right sign (obtained from the non-carry part) to E_20
. If this triggers an overflow, the result wouldn't fit either.
E_10
now has enough 'space' to take the upper half from E_00
(shift, add, erase) and the carry bit E_01
.
E_10
may be larger now again, so we repeat the transfer to E_20
.
At this point, E_20
must become zero, otherwise the result won't fit. The upper half of E_10
is empty as result of the transfer too.
The final step is to transfer the lower half of E_20
into E_10
again.
If the expectation that E=A*B+C*D
would fit the signed long long int
holds, we now have
E_20=0
E_10=0
E_00=E