Efficient computation of the high order bits of a 32 bit integer multiplication

前端未结

关注

 3  1201

Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produc

相关标签:

3条回答

花落未央

2021-01-04 19:31
gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:
```
umulhi32:
        pushl   %ebp
        movl    %esp, %ebp
        movl    12(%ebp), %eax
        mull    8(%ebp)
        movl    %edx, %eax
        popl    %ebp
        ret
```
Which is just doing a single 32 bit mull and putting the high 32 bits of the result (from %edx) into the return value.

That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:
```
unsigned int umulhi32(unsigned int x, unsigned int y)
{
  return (unsigned int)(((unsigned long long)x * y)>>32);
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2021-01-04 19:38
I don't think there's a way to do this in standard C/C++ better than what you already have. What I'd do is write up a simple assembly wrapper that returned the result you want.

Not that you're asking about Windows, but as an example even though Windows has an API that sounds like it does what you want (a 32 by 32 bit multiply while obtaining the full 64 bit result), it implements the multiply as a macro that does what you're doing:
```
#define UInt32x32To64( a, b ) (ULONGLONG)((ULONGLONG)(DWORD)(a) * (DWORD)(b))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
一个人的身影

2021-01-04 19:46

On 32 bit intel, a multiply affects two registers for the output. That is, the 64 bits are fully available, whether you want it or not. Its just a function of whether the compiler is smart enough to take advantage of it.

Modern compilers do amazing things, so my suggestion is to experiment with optimization flags some more, at least on Intel. You would think that the optimizer might know that the processor produces a 64 bit value from 32 by 32 bits.

That said, at some point I tried to get the compiler to use the modulo as well as the dividend on a division result, but the old Microsoft compiler from 1998 was not smart enough to realize the same instruction produced both results.

0 讨论(0)
发布评论:

提交评论
- 加载中...