This answer shows how to get the (exact) top 64-bits from a 64x64 bit multiply on a system that doesn't support 128-bit integers. The answer by @amdn will give better performance on systems that do support 128-bit integers.
The diagram below shows one method for computing a 128-bit product from two 64-bit numbers. Each black rectangle represents a 64-bit number. The 64-bit inputs to the method, X
and Y
, are divided into 32-bits chunks labeled a
, b
, c
, and d
. Then four 32x32 bit multiplications are performed, giving four 64-bit products labeled a*c
, b*c
, a*d
, and b*d
. The four products must be shifted and added to compute the final answer.
Note that the lower 32-bits of the 128-bit product are solely determined by the lower 32-bits of partial product b*d
. The next 32-bits are determined by the lower 32-bits of the following
mid34 = ((b*c) & 0xffffffff) + ((a*d) & 0xffffffff) + ((b*d) >> 32);
Note that mid34
is the sum of three 32-bit numbers and therefore is in fact a 34-bit sum. The upper two bits of mid34
act as a carry into the top 64-bits of the 64x64 bit multiply.
Which brings us to the demo code. The top64
function computes the upper 64-bits of a 64x64 multiply. It's a little verbose to allow for the calculation of the lower 64-bits to be shown in a comment. The main
function takes advantage of 128-bit integers to verify the results with a simple test case. Further testing is left as an exercise for the reader.
#include <stdio.h>
#include <stdint.h>
typedef unsigned __int128 uint128_t;
uint64_t top64( uint64_t x, uint64_t y )
{
uint64_t a = x >> 32, b = x & 0xffffffff;
uint64_t c = y >> 32, d = y & 0xffffffff;
uint64_t ac = a * c;
uint64_t bc = b * c;
uint64_t ad = a * d;
uint64_t bd = b * d;
uint64_t mid34 = (bd >> 32) + (bc & 0xffffffff) + (ad & 0xffffffff);
uint64_t upper64 = ac + (bc >> 32) + (ad >> 32) + (mid34 >> 32);
// uint64_t lower64 = (mid34 << 32) | (bd & 0xffffffff);
return upper64;
}
int main( void )
{
uint64_t x = 0x0000000100000003;
uint64_t y = 0x55555555ffffffff;
uint128_t m = x, n = y;
uint128_t p = m * n;
uint64_t top = p >> 64;
printf( "%016llx %016llx\n", top, top64( x, y ) );
}