In various contexts, for example for the argument reduction for mathematical functions, one needs to compute (a - K) / (a + K)
, where a
is a positive v
If you can relax the API to return another variable that models the error, then the solution becomes much simpler:
float foo(float a, float k, float *res)
{
float ret=(a-k)/(a+k);
*res = fmaf(-ret,a+k,a-k)/(a+k);
return ret;
}
This solution only handles truncation error of division, but does not handle the loss of precision of a+k
and a-k
.
To handle those errors, I think I need to use double precision, or bithack to use fixed point.
Test code is updated to artificially generate non zero least significant bits in the input
test code
https://ideone.com/bHxAg8