Optimizations for pow() with const non-integer exponent?

前端 未结 10 627
旧时难觅i
旧时难觅i 2020-12-04 09:17

I have hot spots in my code where I\'m doing pow() taking up around 10-20% of my execution time.

My input to pow(x,y) is very specific, so

相关标签:
10条回答
  • 2020-12-04 09:49

    Another answer because this is very different from my previous answer, and this is blazing fast. Relative error is 3e-8. Want more accuracy? Add a couple more Chebychev terms. It's best to keep the order odd as this makes for a small discontinuity between 2^n-epsilon and 2^n+epsilon.

    #include <stdlib.h>
    #include <math.h>
    
    // Returns x^(5/12) for x in [1,2), to within 3e-8 (relative error).
    // Want more precision? Add more Chebychev polynomial coefs.
    double pow512norm (
       double x)
    {
       static const int N = 8;
    
       // Chebychev polynomial terms.
       // Non-zero terms calculated via
       //   integrate (2/pi)*ChebyshevT[n,u]/sqrt(1-u^2)*((u+3)/2)^(5/12)
       //   from -1 to 1
       // Zeroth term is similar except it uses 1/pi rather than 2/pi.
       static const double Cn[N] = { 
           1.1758200232996901923,
           0.16665763094889061230,
          -0.0083154894939042125035,
           0.00075187976780420279038,
          // Wolfram alpha doesn't want to compute the remaining terms
          // to more precision (it times out).
          -0.0000832402,
           0.0000102292,
          -1.3401e-6,
           1.83334e-7};
    
       double Tn[N];
    
       double u = 2.0*x - 3.0;
    
       Tn[0] = 1.0;
       Tn[1] = u;
       for (int ii = 2; ii < N; ++ii) {
          Tn[ii] = 2*u*Tn[ii-1] - Tn[ii-2];
       }   
    
       double y = 0.0;
       for (int ii = N-1; ii >= 0; --ii) {
          y += Cn[ii]*Tn[ii];
       }   
    
       return y;
    }
    
    
    // Returns x^(5/12) to within 3e-8 (relative error).
    double pow512 (
       double x)
    {
       static const double pow2_512[12] = {
          1.0,
          pow(2.0, 5.0/12.0),
          pow(4.0, 5.0/12.0),
          pow(8.0, 5.0/12.0),
          pow(16.0, 5.0/12.0),
          pow(32.0, 5.0/12.0),
          pow(64.0, 5.0/12.0),
          pow(128.0, 5.0/12.0),
          pow(256.0, 5.0/12.0),
          pow(512.0, 5.0/12.0),
          pow(1024.0, 5.0/12.0),
          pow(2048.0, 5.0/12.0)
       };
    
       double s;
       int iexp;
    
       s = frexp (x, &iexp);
       s *= 2.0;
       iexp -= 1;
    
       div_t qr = div (iexp, 12);
       if (qr.rem < 0) {
          qr.quot -= 1;
          qr.rem += 12;
       }
    
       return ldexp (pow512norm(s)*pow2_512[qr.rem], 5*qr.quot);
    }
    

    Addendum: What's going on here?
    Per request, the following explains how the above code works.

    Overview
    The above code defines two functions, double pow512norm (double x) and double pow512 (double x). The latter is the entry point to the suite; this is the function that user code should call to calculate x^(5/12). The function pow512norm(x) uses Chebyshev polynomials to approximate x^(5/12), but only for x in the range [1,2]. (Use pow512norm(x) for values of x outside that range and the result will be garbage.)

    The function pow512(x) splits the incoming x into a pair (double s, int n) such that x = s * 2^n and such that 1≤s<2. A further partitioning of n into (int q, unsigned int r) such that n = 12*q + r and r is less than 12 lets me split the problem of finding x^(5/12) into parts:

    1. x^(5/12)=(s^(5/12))*((2^n)^(5/12)) via (uv)^a=(u^a)(v^a) for positive u,v and real a.
    2. s^(5/12) is calculated via pow512norm(s).
    3. (2^n)^(5/12)=(2^(12*q+r))^(5/12) via substitution.
    4. 2^(12*q+r)=(2^(12*q))*(2^r) via u^(a+b)=(u^a)*(u^b) for positive u, real a,b.
    5. (2^(12*q+r))^(5/12)=(2^(5*q))*((2^r)^(5/12)) via some more manipulations.
    6. (2^r)^(5/12) is calculated by the lookup table pow2_512.
    7. Calculate pow512norm(s)*pow2_512[qr.rem] and we're almost there. Here qr.rem is the r value calculated in step 3 above. All that is needed is to multiply this by 2^(5*q) to yield the desired result.
    8. That is exactly what the math library function ldexp does.

    Function Approximation
    The goal here is to come up with an easily computable approximation of f(x)=x^(5/12) that is 'good enough' for the problem at hand. Our approximation should be close to f(x) in some sense. Rhetorical question: What does 'close to' mean? Two competing interpretations are minimizing the mean square error versus minimizing the maximum absolute error.

    I'll use a stock market analogy to describe the difference between these. Suppose you want to save for your eventual retirement. If you are in your twenties, the best thing to do is to invest in stocks or stock market funds. This is because over a long enough span of time, the stock market on average beats any other investment scheme. However, we've all seen times when putting money into stocks is a very bad thing to do. If you are in your fifties or sixties (or forties if you want to retire young) you need to invest a bit more conservatively. Those downswings can wreak have on your retirement portfolio.

    Back to function approximation: As the consumer of some approximation, you are typically worried about the worst-case error rather than the performance "on average". Use some approximation constructed to give the best performance "on average" (e.g. least squares) and Murphy's law dictates that your program will spend a whole lot of time using the approximation exactly where the performance is far worse than average. What you want is a minimax approximation, something that minimizes the maximum absolute error over some domain. A good math library will take a minimax approach rather than a least squares approach because this lets the authors of the math library give some guaranteed performance of their library.

    Math libraries typically use a polynomial or a rational polynomial to approximate some function f(x) over some domain a≤x≤b. Suppose the function f(x) is analytic over this domain and you want to approximate the function by some polynomial p(x) of degree N. For a given degree N there exists some magical, unique polynomial p(x) such that p(x)-f(x) has N+2 extrema over [a,b] and such that the absolute values of these N+2 extrema are all equal to one another. Finding this magical polynomial p(x) is the holy grail of function approximators.

    I did not find that holy grail for you. I instead used a Chebyshev approximation. The Chebyshev polynomials of the first kind are an orthogonal (but not orthonormal) set of polynomials with some very nice features when it comes to function approximation. The Chebyshev approximation oftentimes is very close to that magical polynomial p(x). (In fact, the Remez exchange algorithm that does find that holy grail polynomial typically starts with a Chebyshev approximation.)

    pow512norm(x)
    This function uses Chebyshev approximation to find some polynomial p*(x) that approximates x^(5/12). Here I'm using p*(x) to distinguish this Chebyshev approximation from the magical polynomial p(x) described above. The Chebyshev approximation p*(x) is easy to find; finding p(x) is a bear. The Chebyshev approximation p*(x) is sum_i Cn[i]*Tn(i,x), where the Cn[i] are the Chebyshev coefficients and Tn(i,x) are the Chebyshev polynomials evaluated at x.

    I used Wolfram alpha to find the Chebyshev coefficients Cn for me. For example, this calculates Cn[1]. The first box after the input box has the desired answer, 0.166658 in this case. That's not as many digits as I would like. Click on 'more digits' and voila, you get a whole lot more digits. Wolfram alpha is free; there is a limit on how much computation it will do. It hits that limit on higher order terms. (If you buy or have access to mathematica you will be able to calculate those high-order coefficients to a high degree of precision.)

    The Chebyshev polynomials Tn(x) are calculated in the array Tn. Beyond giving something very close to magical polynomial p(x), another reason for using Chebyshev approximation is that the values of those Chebyshev polynomials are easily calculated: Start with Tn[0]=1 and Tn[1]=x, and then iteratively calculate Tn[i]=2*x*Tn[i-1] - Tn[i-2]. (I used 'ii' as the index variable rather than 'i' in my code. I never use 'i' as a variable name. How many words in the English language have an 'i' in the word? How many have two consecutive 'i's?)

    pow512(x)
    pow512 is the function that user code should be calling. I already described the basics of this function above. A few more details: The math library function frexp(x) returns the significand s and exponent iexp for the input x. (Minor issue: I want s between 1 and 2 for use with pow512norm but frexp returns a value between 0.5 and 1.) The math library function div returns the quotient and remainder for integer division in one swell foop. Finally, I use the math library function ldexp to put the three parts together to form the final answer.

    0 讨论(0)
  • 2020-12-04 09:49

    For exponents of 2.4, you could either make a lookup table for all your 2.4 values and lirp or perhaps higher-order function to fill in the in-betweem values if the table wasn't accurate enough (basically a huge log table.)

    Or, value squared * value to the 2/5s which could take the initial square value from the first half of the function and then 5th root it. For the 5th root, you could Newton it or do some other fast approximator, though honestly once you get to this point, your probably better off just doing the exp and log functions with the appropriate abbreviated series functions yourself.

    0 讨论(0)
  • 2020-12-04 09:53

    First off, using floats isn't going to buy much on most machines nowadays. In fact, doubles can be faster. Your power, 1.0/2.4, is 5/12 or 1/3*(1+1/4). Even though this is calling cbrt (once) and sqrt (twice!) it is still twice as fast as using pow(). (Optimization: -O3, compiler: i686-apple-darwin10-g++-4.2.1).

    #include <math.h> // cmath does not provide cbrt; C99 does.
    double xpow512 (double x) {
      double cbrtx = cbrt(x);
      return cbrtx*sqrt(sqrt(cbrtx));
    }
    
    0 讨论(0)
  • 2020-12-04 09:55

    The following is an idea you can use with any of the fast calculation methods. Whether it helps things go faster depends on how your data arrives. You can use the fact that if you know x and pow(x, n), you can use the rate of change of the power to compute a reasonable approximation of pow(x + delta, n) for small delta, with a single multiply and add (more or less). If successive values you feed your power functions are close enough together, this would amortize the full cost of the accurate calculation over multiple function calls. Note that you don't need an extra pow calculation to get the derivative. You could extend this to use the second derivative so you can use a quadratic, which would increase the delta you could use and still get the same accuracy.

    0 讨论(0)
提交回复
热议问题