While investigating floating-point exception status flags, I came across the curious case of a status flag FE_UNDERFLOW
set when not expected.
This is similar to When does underflow occur? yet goes into a corner case that may be a C specification issue or FP hardware defect.
// pseudo code
// s bias_expo implied "mantissa"
w = smallest_normal; // 0 000...001 (1) 000...000
x = w * 2; // 0 000...010 (1) 000...000
y = next_smaller(x); // 0 000...001 (1) 111...111
round_mode(FE_TONEAREST);
clear_status_flags();
z = y/2; // 0 000...001 (1) 000...000
FE_UNDERFLOW is set!?
I did not expect FE_UNDERFLOW
to be set as z
above is normal, not a sub-normal.
I expect FE_UNDERFLOW
when the result of an earlier floating-point operation was subnormal with a loss of precision. In this case there is a loss of precision.
I tried this with my float
and long double
and had the same result.
After much investigation, I noted that __STDC_IEC_559__
is not defined.
Questions
If
__STDC_IEC_559__
is defined, what is the correct state of underflow in this case?With lack of a defined
__STDC_IEC_559__
am I stuck with "Implementations that do not define__STDC_IEC_559__
are not required to conform to these specifications." C11 or is there some C specification that indicates this result is incorrect?Since this is certainly a result of my hardware (processor), your result may differ and that would be interesting to know.
Follows is some test code that demos this. At first I suspected it is because FLT_EVAL_METHOD = 2
on my machine, yet then I tried similar code with long double
and the same result.
// These 2 includes missing in original post, yet in my true test code
#include <float.h>
#include <math.h>
#include <fenv.h>
#include <stdio.h>
#include <stdint.h>
#define N (sizeof excepts/sizeof excepts[0])
void Report_IEC_FP_exception_status_flags(const char *s) {
printf("%s", s);
int excepts[] = { //
FE_DIVBYZERO, FE_INEXACT, FE_INVALID, FE_OVERFLOW, FE_UNDERFLOW, };
const char *excepts_str[N] = { //
"FE_DIVBYZERO", "FE_INEXACT", "FE_INVALID", "FE_OVERFLOW", "FE_UNDERFLOW", };
int excepts_val[N];
for (unsigned i = 0; i < N; i++) {
excepts_val[i] = fetestexcept(excepts[i]);
}
for (unsigned i = 0; i < N; i++) {
if (excepts_val[i]) printf(" %s", excepts_str[i]);
}
printf("\n");
fflush(stdout);
}
#undef N
void test2(float f, int round_mode, const char *name) {
union {
float f;
uint32_t u32;
} x = { .f = f};
printf("x:%+17a %08lX normal:%c round_mode:%d %s\n", //
f, (unsigned long) x.u32, isnormal(f) ? 'Y' : 'n', round_mode, name);
if (feclearexcept(FE_ALL_EXCEPT)) puts("Clear Fail");
Report_IEC_FP_exception_status_flags("Before:");
f /= 2;
Report_IEC_FP_exception_status_flags("After :");
printf("y:%+17a %08lX normal:%c\n\n",
f,(unsigned long) x.u32, isnormal(f) ? 'Y' : 'n');
}
Driver
// In same file as above
int main(void) {
#ifdef __STDC_IEC_559__
printf("__STDC_IEC_559__ = %d\n", __STDC_IEC_559__);
#else
printf("__STDC_IEC_559__ = not define\n");
#endif
float f = FLT_MIN;
printf("FLT_EVAL_METHOD = %d\n", FLT_EVAL_METHOD);
printf("FLT_MIN:%+17a\n", f);
f *= 2.0f;
test2(f, FE_TONEAREST, "FE_TONEAREST");
f = nextafterf(f, 0);
test2(f, FE_TONEAREST, "FE_TONEAREST"); // *** problem? ***
f = nextafterf(f, 0);
test2(f, FE_TONEAREST, "FE_TONEAREST");
}
Output
__STDC_IEC_559__ = not define
FLT_EVAL_METHOD = 2
FLT_MIN: +0x1p-126
x: +0x1p-125 01000000 normal:Y round_mode:0 FE_TONEAREST
Before:
After :
y: +0x1p-126 01000000 normal:Y
x: +0x1.fffffep-126 00FFFFFF normal:Y round_mode:0 FE_TONEAREST
Before:
After : FE_INEXACT FE_UNDERFLOW *** Why FE_UNDERFLOW? ***
y: +0x1p-126 00FFFFFF normal:Y *** Result is normal ***
x: +0x1.fffffcp-126 00FFFFFE normal:Y round_mode:0 FE_TONEAREST
Before:
After :
y: +0x1.fffffcp-127 00FFFFFE normal:n
Ref
Implementation notes:
GNU C11 (GCC) version 6.4.0 (i686-pc-cygwin) compiled by GNU C version 6.4.0, GMP version 6.1.2, MPFR version 3.1.5-p10, MPC version 1.0.3, isl version 0.14 or 0.13
glibc 2.26 released.
Intel Xeon W3530, 64-bit OS (Windows 7)
[Minor Update] The illustrative print of the quotient as an 32-bit hex number should have used y.u32
. This does not change the function under test
// printf("y:%+17a %08lX normal:%c\n\n",
// f,(unsigned long) x.u32, isnormal(f) ? 'Y' : 'n');
union {
float f;
uint32_t u32;
} y = { .f = f};
printf("y:%+17a %08lX normal:%c\n\n",
f,(unsigned long) y.u32, isnormal(f) ? 'Y' : 'n');
// ^^^^^
Although not intended as a self answer, input from various commenters @John Bollinger, @nwellnhof and further research leads to:
Can the floating-point status flag FE_UNDERFLOW set when the result is not sub-normal?
Yes, in narrow situations. See below.
"Underflow" occurs when:
The result underflows if the magnitude of the mathematical result is so small that the mathematical result cannot be represented, without extraordinary roundoff error, in an object of the specified type. C11 7.12.1 Treatment of error conditions
The z = y/2;
above is 1) inexact (due to rounding) and 2) maybe considered "too small".
Math
The z = y/2;
can be thought of going through 2 stages: dividing and rounding. The mathematical quotient, with unlimited precision, is less than the smallest normal number FLT_MIN
and more than the greatest sub-normal number nextafterf(FLT_MIN,0)
. Depending on rounding mode, the final answer is either one of those two. With FE_TONEAREST
, z
is assigned FLT_MIN
, a normal number.
Spec
The C spec below and to IEC 60559 indicate
The "underflow" floating-point exception is raised whenever a result is tiny (essentially subnormal or zero) and suffers loss of accuracy.358 C11 §F.10 7.
358 IEC 60559 allows different definitions of underflow. They all result in the same values, but differ on when the floating-point exception is raised.
and
Two definitions were allowed for the determination of the 'tiny' condition: before or after rounding the infinitely precise result to working precision, with unbounded exponent.
Annex U of 754r recommended that only tininess after rounding and inexact as loss of accuracy be a cause for underflow signal. wiki reference
(My emphasis)
Q & A
- If STDC_IEC_559 is defined, what is the correct state of underflow in this case?
The underflow flag may be set or left alone in this case. Either complies. There is a preference though, for not setting the underflow flag.
2 With lack of a defined STDC_IEC_559 am I stuck with "Implementations that do not define STDC_IEC_559 are not required to conform to these specifications." C11 or is there some C specification that indicates this result is incorrect?
The setting of the underflow flag result in not incorrect. The FP spec allows this behavior. It also allows to not set the underflow flag.
3 Since this is certainly a result of my hardware (processor), your result may differ and that would be interesting to know.
On another platform, where __STDC_IEC_559__ = not define
and FLT_EVAL_METHOD = 0
, the FE_INEXACT FE_UNDERFLOW
flags were both set, just like in the above tst case. The issue applies to float, double, long double
.
If the mathematical answer lies in the grey "Between" zone below, it will get rounding down to a sub-normal double
or up to the normal double
DBL_MIN
depending on its value and rounding mode. If rounded down, then FE_UNDERFLOW
is certainly set. If rounded up, then FE_UNDERFLOW
may be set or not depending on when determination of the 'tiny' condition is applied.
来源:https://stackoverflow.com/questions/46611633/can-the-floating-point-status-flag-fe-underflow-set-when-the-result-is-not-sub-n