Built-in mod ('%') vs custom mod function: improve the performance of modulus operation

前端 未结 5 1687
逝去的感伤
逝去的感伤 2020-12-09 06:43

Recently I came to know that the mod(\'%\') operator is very slow. So I made a function which will work just like a%b. But is it faster than the mod operator?

Here\'

相关标签:
5条回答
  • 2020-12-09 07:02

    Just contributing a little bit with this discussion. If you want to handle negative numbers, use the following function:

    inline long long mod(const long long x, const long long y) {
        if (x >= y) {
            return x % y;
        } else if (x < 0) {
            return (x % y + y) % y;
        } else {
            return x;
        }
    }
    
    0 讨论(0)
  • 2020-12-09 07:13

    Most of the time, your micro optimized code will not beat the compiler. I also don't know where that "wisdom" comes from, that claims the built in % to be slow. It is just as fast as the machine will be able to calculate it - with all the micro optimizations the compiler can do for you.

    Also note, that performance measurements of such very small pieces of code is not an easy task. Conditionals of a loop construct or the jitter of your time measurement might dominate your results. You can find some talks on such issues by people like e.g. Andrei Alexantrescu, or Chandler Caruth on youtube. I have once written a micro benchmarking framework for a project I was working on. There is really a lot to care about, including external stuff like the OS preempting your thread, or moving it to another core.

    0 讨论(0)
  • 2020-12-09 07:16

    According to Chandler Carruth's benchmarks at CppCon 2015, the fastest modulo operator (on x86, when compiled with Clang) is:

    int fast_mod(const int input, const int ceil) {
        // apply the modulo operator only when needed
        // (i.e. when the input is greater than the ceiling)
        return input >= ceil ? input % ceil : input;
        // NB: the assumption here is that the numbers are positive
    }
    

    I suggest that you watch the whole talk, he goes into more details on why this method is faster than just using % unconditionally.

    0 讨论(0)
  • 2020-12-09 07:16

    This will likely be compiler and platform dependent.

    But I was interested and on my system you appear to be correct in my benchmarks. However the method from @865719's answer is fastest:

    #include <chrono>
    #include <iostream>
    
    class Timer
    {
        using clk = std::chrono::steady_clock;
        using microseconds = std::chrono::microseconds;
    
        clk::time_point tsb;
        clk::time_point tse;
    
    public:
    
        void clear() { tsb = tse = clk::now(); }
        void start() { tsb = clk::now(); }
        void stop() { tse = clk::now(); }
    
        friend std::ostream& operator<<(std::ostream& o, const Timer& timer)
        {
            return o << timer.secs();
        }
    
        // return time difference in seconds
        double secs() const
        {
            if(tse <= tsb)
                return 0.0;
            auto d = std::chrono::duration_cast<microseconds>(tse - tsb);
            return d.count() / 1000000.0;
        }
    };
    
    int mod(int a, int b)
    {
        int tmp=a/b;
        return a-(b*tmp);
    }
    
    int fast_mod(const int input, const int ceil) {
        // apply the modulo operator only when needed
        // (i.e. when the input is greater than the ceiling)
        return input < ceil ? input : input % ceil;
        // NB: the assumption here is that the numbers are positive
    }
    
    int main()
    {
        auto N = 1000000000U;
        unsigned sum = 0;
    
        Timer timer;
    
        for(auto times = 0U; times < 3; ++times)
        {
            std::cout << "     run: " << (times + 1) << '\n';
    
            sum = 0;
            timer.start();
            for(decltype(N) n = 0; n < N; ++n)
                sum += n % (N - n);
            timer.stop();
    
            std::cout << "       %: " << sum << " " << timer << "s" << '\n';
    
            sum = 0;
            timer.start();
            for(decltype(N) n = 0; n < N; ++n)
                sum += mod(n, N - n);
            timer.stop();
    
            std::cout << "     mod: " << sum << " " << timer << "s" << '\n';
    
            sum = 0;
            timer.start();
            for(decltype(N) n = 0; n < N; ++n)
                sum += fast_mod(n, N - n);
            timer.stop();
    
            std::cout << "fast_mod: " << sum << " " << timer << "s" << '\n';
        }
    }
    

    Build: GCC 5.1.1 (x86_64)

    g++ -std=c++14 -march=native -O3 -g0 ...
    

    Output:

         run: 1
           %: 3081207628 5.49396s
         mod: 3081207628 4.30814s
    fast_mod: 3081207628 2.51296s
         run: 2
           %: 3081207628 5.5522s
         mod: 3081207628 4.25427s
    fast_mod: 3081207628 2.52364s
         run: 3
           %: 3081207628 5.4947s
         mod: 3081207628 4.29646s
    fast_mod: 3081207628 2.56916s
    
    0 讨论(0)
  • 2020-12-09 07:21

    It is often possible for a programmer to beat the performance of the remainder operation in cases where a programmer knows things about the operands that the compiler doesn't. For example, if the base is likely to be a power of 2, but is not particularly likely to be larger than the value to be reduced, one could use something like:

    unsigned mod(unsigned int x, unsigned int y)
    {
      return y & (y-1) ? x % y : x & (y-1);
    }
    

    If the compiler expands the function in-line and the base is a constant power of 2, the compiler will replace the remainder operator with a bitwise AND, which is apt to be a major improvement. In cases where the base isn't a constant power of two, the generated code would need to do a little bit of computation before selecting whether to use the remainder operator, but in cases where the base happens to be a power of two the cost savings of the bitwise AND may exceed the cost of the conditional logic.

    Another scenario where a custom modulus function may help is when the base is a fixed constant for which the compiler hasn't made provisions to compute the remainder. For example, if one wants to compute x % 65521 on a platform which can perform rapid integer shifts and multiplies, one may observe that computing x -= (x>>16)*65521; will cause x to be much smaller but will not affect the value of x % 65521. Doing the operation a second time will reduce x to the range 0..65745--small enough that a single conditional subtraction will yield the correct remainder.

    Some compilers may know how to use such techniques to handle the % operator efficiently with a constant base, but for those that don't the approach can be a useful optimization, especially when dealing with numbers larger than a machine word [observe that 65521 is 65536-15, so on a 16-bit machine one could evaluate x as x = (x & 65535) + 15*(x >> 16). Not as readable as the form which subtracts 65521 * (x >> 16), but it's easy to see how it could be handled efficiently on a 16-bit machine.

    0 讨论(0)
提交回复
热议问题