C++ fast division/mod by 10^x

后端 未结 10 518
我寻月下人不归
我寻月下人不归 2020-12-03 05:13

In my program I use a lot of integer division by 10^x and integer mod function of power 10.

For example:

unsigned __int64 a = 12345;
a = a / 100;
...         


        
相关标签:
10条回答
  • 2020-12-03 05:20

    You can also take a look at the libdivide project. It is designed to speed-up the integer division, in the general case.

    0 讨论(0)
  • 2020-12-03 05:21

    If the divisor is an explicit compile-time constant (i.e. if your x in 10^x is a compile-time constant), there's absolutely no point in using anything else than the language-provided / and % operators. If there a meaningful way to speed them up for explicit powers of 10, any self-respecting compiler will know how to do that and will do that for you.

    The only situation when you might think about a "custom" implementation (aside from a dumb compiler) is the situation when x is a run-time value. In that case you'd need some kind of decimal-shift and decimal-and analogy. On a binary machine, a speedup is probably possible, but I doubt that you'll be able to achieve anything practically meaningful. (If the numbers were stored in binary-decimal format, then it would be easy, but in "normal" cases - no.)

    0 讨论(0)
  • 2020-12-03 05:21

    In fact you don't need to do anything. The compiler is smart enough to optimize multiplications/divisions with constants. You can find many examples here

    • Why does GCC use multiplication by a strange number in implementing integer division?
    • Divide by 10 using bit shifts?
    • Fast Division on GCC/ARM

    You can even do a fast divide by 5 then shift right by 1

    0 讨论(0)
  • 2020-12-03 05:22

    Not unless you're architecture supports Binary Coded Decimal, and even then only with lots of assembly messiness.

    0 讨论(0)
  • 2020-12-03 05:25

    If your runtime is genuinely dominated by 10x-related operations, you could just use a base 10 integer representation in the first place.

    In most situations, I'd expect the slowdown of all other integer operations (and reduced precision or potentially extra memory use) would count for more than the faster 10x operations.

    0 讨论(0)
  • 2020-12-03 05:28

    On a different note instead, it might make more sense to just write a proper version of Div#n# in assembler. Compilers can't always predict the end result as efficiently (though, in most cases, they do it rather well). So if you're running in a low-level microchip environment, consider a hand written asm routine.

    #define BitWise_Div10(result, n) {      \
        /*;n = (n >> 1) + (n >> 2);*/           \
        __asm   mov     ecx,eax                 \
        __asm   mov     ecx, dword ptr[n]       \
        __asm   sar     eax,1                   \
        __asm   sar     ecx,2                   \
        __asm   add     ecx,eax                 \
        /*;n += n < 0 ? 9 : 2;*/                \
        __asm   xor     eax,eax                 \
        __asm   setns   al                      \
        __asm   dec     eax                     \
        __asm   and     eax,7                   \
        __asm   add     eax,2                   \
        __asm   add     ecx,eax                 \
        /*;n = n + (n >> 4);*/                  \
        __asm   mov     eax,ecx                 \
        __asm   sar     eax,4                   \
        __asm   add     ecx,eax                 \
        /*;n = n + (n >> 8);*/                  \
        __asm   mov     eax,ecx                 \
        __asm   sar     eax,8                   \
        __asm   add     ecx,eax                 \
        /*;n = n + (n >> 16);*/                 \
        __asm   mov     eax,ecx                 \
        __asm   sar     eax,10h                 \
        __asm   add     eax,ecx                 \
        /*;return n >> 3;}*/                    \
        __asm   sar     eax,3                   \
        __asm   mov     dword ptr[result], eax  \
    }
    

    Usage:

    int x = 12399;
    int r;
    BitWise_Div10(r, x); // r = x / 10
    // r == 1239
    

    Again, just a note. This is better used on chips that indeed have really bad division. On modern processors and modern compilers, divisions are often optimized out in very clever ways.

    0 讨论(0)
提交回复
热议问题