Power of 2 approximation in fixed point

前端未结

关注

 1  1993

Currently, I am using a small lookup table and linear interpolation which is quite fast and also accurate enough (max error is less than 0.001). However I was wondering if t

相关标签:

1条回答

青春惊慌失措

2021-01-07 05:20
Since no specific fixed-point format was stated, I will demonstrate a possible alternative to table lookup using s15.16 fixed-point arithmetic, which is fairly commonly used. The basic idea is to split the input a into an integral portion i and a fractional portion f, such that f in [-0.5,0.5], then use a minimax polynomial approximation for exp2(f) on [-0.5, 0.5] and perform final scaling based on i.

Minimax approximations can be generated with tools such as Mathematica, Maple, or Sollya. If none of these tools are available, one could use a custom implementation of the Remez algorithm to generate minimax aproximations.

The Horner scheme should be used to evaluate the polynomial. Since fixed-point arithmetic is used, the evaluation of the polynomial should scale operands to the maximum extent possible (i.e. without overflow) in intermediate steps to optimized the accuracy of the computation.

The C code below assumes that right shifts applied to signed integer data types result in arithmetic shift operations, and therefore negative operands are shifted appropriately. This is not guaranteed by the ISO C standard, but in my experience it will work fine with various tool chains. In the worst case, inline assembly could be used to force generation of the desired arithmetic right shift instructions.

The output of the test included with the fixed_exp2() implementation below should look as follows:
```
testing fixed_exp2 with inputs in [-5.96484, 15)
max. rel. err = 0.000999758
```
This demonstrates that the desired error bound of 0.001 is met for inputs in the interval [-5.96484, 15).
```
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <math.h>

/* compute exp2(a) in s15.16 fixed-point arithmetic, -16 < a < 15 */
int32_t fixed_exp2 (int32_t a)
{
    int32_t i, f, r, s;
    /* split a = i + f, such that f in [-0.5, 0.5] */
    i = (a + 0x8000) & ~0xffff; // 0.5
    f = a - i;   
    s = ((15 << 16) - i) >> 16;
    /* minimax approximation for exp2(f) on [-0.5, 0.5] */
    r = 0x00000e20;                 // 5.5171669058037949e-2
    r = (r * f + 0x3e1cc333) >> 17; // 2.4261112219321804e-1
    r = (r * f + 0x58bd46a6) >> 16; // 6.9326098546062365e-1
    r = r * f + 0x7ffde4a3;         // 9.9992807353939517e-1
    return (uint32_t)r >> s;
}

double fixed_to_float (int32_t a)
{
    return a / 65536.0;
}

int main (void)
{
    double a, res, ref, err, maxerr = 0.0;
    int32_t x, start, end;

    start = 0xfffa0900;
    end = 0x000f0000;
    printf ("testing fixed_exp2 with inputs in [%g, %g)\n",  
            fixed_to_float (start), fixed_to_float (end));

    for (x = start; x < end; x++) {
        a = fixed_to_float (x);
        ref = exp2 (a);
        res = fixed_to_float (fixed_exp2 (x));
        err = fabs (res - ref) / ref;
        if (err > maxerr) {
            maxerr = err;
        }
    }
    printf ("max. rel. err = %g\n", maxerr);
    return EXIT_SUCCESS;
}
```
0 讨论(0)
发布评论:

提交评论
- 加载中...