问题
Currently, I am using a small lookup table and linear interpolation which is quite fast and also accurate enough (max error is less than 0.001). However I was wondering if there is an approximation which is even faster.
Since the integer part of the exponent can be extracted and calculated by bitshifts, the approximation just needs to work in the range [-1,1] I have tried to find a chebyshev polynomial, but could not achieve a good accuracy for polynomials of low order. I could live with a max error around 0.01 I guess, but I did not get near that number. Higher order polynomials are not an option, since they are much less efficient than my current lookup table based solution.
回答1:
Since no specific fixed-point format was stated, I will demonstrate a possible alternative to table lookup using s15.16
fixed-point arithmetic, which is fairly commonly used. The basic idea is to split the input a
into an integral portion i
and a fractional portion f
, such that f
in [-0.5,0.5], then use a minimax polynomial approximation for exp2(f)
on [-0.5, 0.5] and perform final scaling based on i
.
Minimax approximations can be generated with tools such as Mathematica, Maple, or Sollya. If none of these tools are available, one could use a custom implementation of the Remez algorithm to generate minimax aproximations.
The Horner scheme should be used to evaluate the polynomial. Since fixed-point arithmetic is used, the evaluation of the polynomial should scale operands to the maximum extent possible (i.e. without overflow) in intermediate steps to optimized the accuracy of the computation.
The C code below assumes that right shifts applied to signed integer data types result in arithmetic shift operations, and therefore negative operands are shifted appropriately. This is not guaranteed by the ISO C standard, but in my experience it will work fine with various tool chains. In the worst case, inline assembly could be used to force generation of the desired arithmetic right shift instructions.
The output of the test included with the fixed_exp2()
implementation below should look as follows:
testing fixed_exp2 with inputs in [-5.96484, 15)
max. rel. err = 0.000999758
This demonstrates that the desired error bound of 0.001 is met for inputs in the interval [-5.96484, 15).
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>
/* compute exp2(a) in s15.16 fixed-point arithmetic, -16 < a < 15 */
int32_t fixed_exp2 (int32_t a)
{
int32_t i, f, r, s;
/* split a = i + f, such that f in [-0.5, 0.5] */
i = (a + 0x8000) & ~0xffff; // 0.5
f = a - i;
s = ((15 << 16) - i) >> 16;
/* minimax approximation for exp2(f) on [-0.5, 0.5] */
r = 0x00000e20; // 5.5171669058037949e-2
r = (r * f + 0x3e1cc333) >> 17; // 2.4261112219321804e-1
r = (r * f + 0x58bd46a6) >> 16; // 6.9326098546062365e-1
r = r * f + 0x7ffde4a3; // 9.9992807353939517e-1
return (uint32_t)r >> s;
}
double fixed_to_float (int32_t a)
{
return a / 65536.0;
}
int main (void)
{
double a, res, ref, err, maxerr = 0.0;
int32_t x, start, end;
start = 0xfffa0900;
end = 0x000f0000;
printf ("testing fixed_exp2 with inputs in [%g, %g)\n",
fixed_to_float (start), fixed_to_float (end));
for (x = start; x < end; x++) {
a = fixed_to_float (x);
ref = exp2 (a);
res = fixed_to_float (fixed_exp2 (x));
err = fabs (res - ref) / ref;
if (err > maxerr) {
maxerr = err;
}
}
printf ("max. rel. err = %g\n", maxerr);
return EXIT_SUCCESS;
}
来源:https://stackoverflow.com/questions/36550388/power-of-2-approximation-in-fixed-point