I have medium size C99 program which uses long double
type (80bit) for floating-point computation. I want to improve precision with new GCC 4.6 extension __fl
How should I convert my program from classic long double of 80-bit to quad floats of 128 bit with software emulation of full precision? What need I change? Compiler flags, sources?
You need recent software, GCC version with support of __float128
type (4.6 and newer) and libquadmath (supported only on x86 and x86_64 targets; in IA64 and HPPA with newer GCC). You should add linker flag -lquadmath
(the cannot find -lquadmath'
will show that you have no libquadmath installed)
#include
header to have macro and function definitions.long double
variable definitions to __float128
. __complex128
type (quadmath.h
) or directly with typedef _Complex float __attribute__((mode(TC))) _Complex128;
__*tf3()
). LDBL_*
, replace them with FLT128_*
(full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)M_PI
) or e (M_E
) with quadruple precision, use predefined constants with q
suffix (M_*q
), like M_PIq
and M_Eq
(full list http://gcc.gnu.org/onlinedocs/libquadmath/Typedef-and-constants.html#Typedef-and-constants)Q
suffix, like 1.3000011111111Q
*q
versions, like sqrtq()
, sinq()
(full list http://gcc.gnu.org/onlinedocs/libquadmath/Math-Library-Routines.html#Math-Library-Routines)__float128 strtoflt128 (const char *s, char **sp)
- http://gcc.gnu.org/onlinedocs/libquadmath/strtoflt128.html#strtoflt128 (Warning, in older libquadmaths there may be some bugs in strtoflt128, do a double check)__float128
is done with help of quadmath_snprintf
function. On linux distributions with recent glibc the function will be automagically registered by libquadmath to handle Q
(may be also q
) length modifier of a, A, e, E, f, F, g, G
conversion specifiers in all printf
s/sprintf
s, like it did L
for long doubles. Example: printf ("%Qe", 1.2Q)
, http://gcc.gnu.org/onlinedocs/libquadmath/quadmath_005fsnprintf.html#quadmath_005fsnprintfYou should also know, that since 4.6 Gfortran will use __float128
type for DOUBLE PRECISION, if the option -fdefault-real-8
was given and there were no option -fdefault-double-8
. This may be problem, since 128 long double is much slower than standard long double on many platforms due to software computation. (Thanks to post by glennglockwood http://glennklockwood.blogspot.com/2014/02/linux-perf-libquadmath-and-gfortrans.html)