How to use Gcc 4.6.0 libquadmath and __float128 on x86 and x86_64

I have medium size C99 program which uses long double type (80bit) for floating-point computation. I want to improve precision with new GCC 4.6 extension __float128. As I get, it is a software-emulated 128-bit precision math.

How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision? What need I change? Compiler flags, sources?

My program have reading of full precision values with strtod, doing a lot of different operations on them (like +-*/ sin, cos, exp and other from <math.h>) and printf-ing of them.

PS: despite that float128 is declared only for Fortran (REAL*16), the libquadmath is written in C and it uses float128. I'm unsure will GCC convert operations on float128 to runtime library or not and I'm unsure how to migrate from long double to __float128 in my sources.

PPS: There is a documentation on "C" language gcc mode: http://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html

"GNU C compiler supports ... 128 bit (TFmode) floating types. Support for additional types includes the arithmetic operators: add, subtract, multiply, divide; unary arithmetic operators; relational operators; equality operators ... __float128 types are supported on i386, x86_64"

How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision? What need I change? Compiler flags, sources?

You need recent software, GCC version with support of __float128 type (4.6 and newer) and libquadmath (supported only on x86 and x86_64 targets; in IA64 and HPPA with newer GCC). You should add linker flag -lquadmath (the cannot find -lquadmath' will show that you have no libquadmath installed)

Add #include <quadmath.h> header to have macro and function definitions.
You should modify all long double variable definitions to __float128.
Complex variables may be changed to __complex128 type (quadmath.h) or directly with typedef _Complex float __attribute__((mode(TC))) _Complex128;
All simple arithmetic operations are automatically handled by GCC (converted to calls of helper functions like __*tf3()).
If you use any macro like LDBL_*, replace them with FLT128_* (full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)
If you need some specific constants like pi (M_PI) or e (M_E) with quadruple precision, use predefined constants with q suffix (M_*q), like M_PIq and M_Eq (full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)
User-defined constants may be written with Q suffix, like 1.3000011111111Q
All math function calls should be replaced with *q versions, like sqrtq(), sinq() (full list http://gcc.gnu.org/onlinedocs/libquadmath/Math-Library-Routines.html#Math-Library-Routines)
Reading quad-float from string should be done with __float128 strtoflt128 (const char *s, char **sp) - http://gcc.gnu.org/onlinedocs/libquadmath/strtoflt128.html#strtoflt128 (Warning, in older libquadmaths there may be some bugs in strtoflt128, do a double check)
Printing the __float128 is done with help of quadmath_snprintf function. On linux distributions with recent glibc the function will be automagically registered by libquadmath to handle Q (may be also q) length modifier of a, A, e, E, f, F, g, G conversion specifiers in all printfs/sprintfs, like it did L for long doubles. Example: printf ("%Qe", 1.2Q), http://gcc.gnu.org/onlinedocs/libquadmath/quadmath_005fsnprintf.html#quadmath_005fsnprintf

You should also know, that since 4.6 Gfortran will use __float128 type for DOUBLE PRECISION, if the option -fdefault-real-8 was given and there were no option -fdefault-double-8. This may be problem, since 128 long double is much slower than standard long double on many platforms due to software computation. (Thanks to post by glennglockwood http://glennklockwood.blogspot.com/2014/02/linux-perf-libquadmath-and-gfortrans.html)

来源：https://stackoverflow.com/questions/6457385/how-to-use-gcc-4-6-0-libquadmath-and-float128-on-x86-and-x86-64

标签

gcc

floating-point

quad

128-bit

floating-point-precision