My program frequently requires the following calculation to be performed:
Given:
Tolerate imprecision and use the 16 MSBits of n,d,x
Algorithm
while (|n| > 0xffff) n/2, sh++
while (|x| > 0xffff) x/2, sh++
while (|d| > 0xffff) d/2, sh--
r = n*x/d // A 16x16 to 32 multiply followed by a 32/16-bit divide.
shift r by sh.
When 64 bit
divide is expensive, the pre/post processing here may be worth to do a 32-bit divide - which will certainly be the big chunk of CPU.
If the compiler cannot be coaxed into doing a 32-bit/16-bit divide, then skip the while (|d| > 0xffff) d/2, sh--
step and do a 32/32 divide.
Use unsigned math as possible.