Efficient computation of the high order bits of a 32 bit integer multiplication

前端 未结 3 1201
青春惊慌失措
青春惊慌失措 2021-01-04 19:24

Many CPUs have single assembly opcodes for returning the high order bits of a 32 bit integer multiplication. Normally multiplying two 32 bit integers produc

相关标签:
3条回答
  • 2021-01-04 19:31

    gcc 4.3.2, with -O1 optimisation or higher, translated your function exactly as you showed it to IA32 assembly like this:

    umulhi32:
            pushl   %ebp
            movl    %esp, %ebp
            movl    12(%ebp), %eax
            mull    8(%ebp)
            movl    %edx, %eax
            popl    %ebp
            ret
    

    Which is just doing a single 32 bit mull and putting the high 32 bits of the result (from %edx) into the return value.

    That's what you wanted, right? Sounds like you just need to turn up the optimisation on your compiler ;) It's possible you could push the compiler in the right direction by eliminating the intermediate variable:

    unsigned int umulhi32(unsigned int x, unsigned int y)
    {
      return (unsigned int)(((unsigned long long)x * y)>>32);
    }
    
    0 讨论(0)
  • 2021-01-04 19:38

    I don't think there's a way to do this in standard C/C++ better than what you already have. What I'd do is write up a simple assembly wrapper that returned the result you want.

    Not that you're asking about Windows, but as an example even though Windows has an API that sounds like it does what you want (a 32 by 32 bit multiply while obtaining the full 64 bit result), it implements the multiply as a macro that does what you're doing:

    #define UInt32x32To64( a, b ) (ULONGLONG)((ULONGLONG)(DWORD)(a) * (DWORD)(b))
    
    0 讨论(0)
  • 2021-01-04 19:46

    On 32 bit intel, a multiply affects two registers for the output. That is, the 64 bits are fully available, whether you want it or not. Its just a function of whether the compiler is smart enough to take advantage of it.

    Modern compilers do amazing things, so my suggestion is to experiment with optimization flags some more, at least on Intel. You would think that the optimizer might know that the processor produces a 64 bit value from 32 by 32 bits.

    That said, at some point I tried to get the compiler to use the modulo as well as the dividend on a division result, but the old Microsoft compiler from 1998 was not smart enough to realize the same instruction produced both results.

    0 讨论(0)
提交回复
热议问题