问题
I would like to compute the sum, rounded up, of two IEEE 754 binary64 numbers. To that end I wrote the C99 program below:
#include <stdio.h>
#include <fenv.h>
#pragma STDC FENV_ACCESS ON
int main(int c, char *v[]){
fesetround(FE_UPWARD);
printf("%a\n", 0x1.0p0 + 0x1.0p-80);
}
However, if I compile and run my program with various compilers:
$ gcc -v … gcc version 4.2.1 (Apple Inc. build 5664) $ gcc -Wall -std=c99 add.c && ./a.out add.c:3: warning: ignoring #pragma STDC FENV_ACCESS 0x1p+0 $ clang -v Apple clang version 1.5 (tags/Apple/clang-60) Target: x86_64-apple-darwin10 Thread model: posix $ clang -Wall -std=c99 add.c && ./a.out add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas] #pragma STDC FENV_ACCESS ON ^ 1 warning generated. 0x1p+0
It doesn't work! (I expected the result 0x1.0000000000001p0
).
Indeed, the computation was done at compile-time in the default round-to-nearest mode:
$ clang -Wall -std=c99 -S add.c && cat add.s add.c:3:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas] #pragma STDC FENV_ACCESS ON ^ 1 warning generated. … LCPI1_0: .quad 4607182418800017408 … callq _fesetround movb $1, %cl movsd LCPI1_0(%rip), %xmm0 leaq L_.str(%rip), %rdx movq %rdx, %rdi movb %cl, %al callq _printf … L_.str: .asciz "%a\n"
Yes, I did see the warning emitted by each compiler. I understand that turning the applicable optimizations on or off at the scale of the line may be tricky. I would still like, if that was at all possible, to turn them off at the scale of the file, which would be enough to resolve my question.
My question is: what command-line option(s) should I use with GCC or Clang so as to compile a C99 compilation unit that contains code intended to be executed with an FPU rounding mode other than the default?
Digression
While researching this question, I found this GCC C99 compliance page, containing the entry below, that I will just leave here in case someone else finds it funny. Grrrr.
floating-point | | environment access | N/A | Library feature, no compiler support required. in <fenv.h> | |
回答1:
I couldn't find any command line options that would do what you wanted. However, I did find a way to rewrite your code so that even with maximum optimizations (even architectural optimizations), neither GCC nor Clang compute the value at compile time. Instead, this forces them to output code that will compute the value at runtime.
C:
#include <fenv.h>
#include <stdio.h>
#pragma STDC FENV_ACCESS ON
// add with rounding up
double __attribute__ ((noinline)) addrup (double x, double y) {
int round = fegetround ();
fesetround (FE_UPWARD);
double r = x + y;
fesetround (round); // restore old rounding mode
return r;
}
int main(int c, char *v[]){
printf("%a\n", addrup (0x1.0p0, 0x1.0p-80));
}
This results in these outputs from GCC and Clang, even when using maximum and architectural optimizations:
gcc -S -x c -march=corei7 -O3
(Godbolt GCC):
addrup:
push rbx
sub rsp, 16
movsd QWORD PTR [rsp+8], xmm0
movsd QWORD PTR [rsp], xmm1
call fegetround
mov edi, 2048
mov ebx, eax
call fesetround
movsd xmm1, QWORD PTR [rsp]
mov edi, ebx
movsd xmm0, QWORD PTR [rsp+8]
addsd xmm0, xmm1
movsd QWORD PTR [rsp], xmm0
call fesetround
movsd xmm0, QWORD PTR [rsp]
add rsp, 16
pop rbx
ret
.LC2:
.string "%a\n"
main:
sub rsp, 8
movsd xmm1, QWORD PTR .LC0[rip]
movsd xmm0, QWORD PTR .LC1[rip]
call addrup
mov edi, OFFSET FLAT:.LC2
mov eax, 1
call printf
xor eax, eax
add rsp, 8
ret
.LC0:
.long 0
.long 988807168
.LC1:
.long 0
.long 1072693248
clang -S -x c -march=corei7 -O3
(Godbolt GCC):
addrup: # @addrup
push rbx
sub rsp, 16
movsd qword ptr [rsp], xmm1 # 8-byte Spill
movsd qword ptr [rsp + 8], xmm0 # 8-byte Spill
call fegetround
mov ebx, eax
mov edi, 2048
call fesetround
movsd xmm0, qword ptr [rsp + 8] # 8-byte Reload
addsd xmm0, qword ptr [rsp] # 8-byte Folded Reload
movsd qword ptr [rsp + 8], xmm0 # 8-byte Spill
mov edi, ebx
call fesetround
movsd xmm0, qword ptr [rsp + 8] # 8-byte Reload
add rsp, 16
pop rbx
ret
.LCPI1_0:
.quad 4607182418800017408 # double 1
.LCPI1_1:
.quad 4246894448610377728 # double 8.2718061255302767E-25
main: # @main
push rax
movsd xmm0, qword ptr [rip + .LCPI1_0] # xmm0 = mem[0],zero
movsd xmm1, qword ptr [rip + .LCPI1_1] # xmm1 = mem[0],zero
call addrup
mov edi, .L.str
mov al, 1
call printf
xor eax, eax
pop rcx
ret
.L.str:
.asciz "%a\n"
Now for the more interesting part: why does that work?
Well, when they (GCC and/or Clang) compile code, they try to find and replace values that can be computed at runtime. This is known as constant propagation. If you had simply written another function, constant propagation would cease to occur, since it isn't supposed to cross functions.
However, if they see a function that they could, in theory, substitute the code of in place of the function call, they may do so. This is known as function inlining. If function inlining will work on a function, we say that that function is (surprise) inlinable.
If a function always return the same results for a given set of inputs, then it is considered pure. We also say that it has no side effects (meaning it makes no changes to the environment).
Now, if a function is fully inlinable (meaning that it doesn't make any calls to external libraries excluding a few defaults included in GCC and Clang - libc
, libm
, etc.) and is pure, then they will apply constant propagation to the function.
In other words, if we don't want them to propagate constants through a function call, we can do one of two things:
- Make the function appear impure:
- Use the filesystem
- Do some bullshit magic with some random input from somewhere
- Use the network
- Use some syscall of some sort
- Call something from an external library unknown to GCC and/or Clang
- Make the function not fully inlinable
- Call something from an external library unknown to GCC and/or Clang
- Use
__attribute__ ((noinline))
Now, that last one is the easiest. As you may have surmised, __attribute__ ((noinline))
marks the function as non-inlinable. Since we can take advantage of this, all we have to do is make another function that does whatever computation we want, mark it with __attribute__ ((noinline))
, and then call it.
When it is compiled, they will not violate the inlining and, by extension, constant propagation rules, and therefore, the value will be computed at runtime with the appropriate rounding mode set.
来源:https://stackoverflow.com/questions/24541589/adding-two-floating-point-numbers