More Precise Floating point Data Types than double?

前端 未结 4 1912
情歌与酒
情歌与酒 2020-12-31 15:20

In my project I have to compute division, multiplication, subtraction, addition on a matrix of double elements. The problem is that when the size of matrix incr

相关标签:
4条回答
  • 2020-12-31 15:52

    You might want to consider the sequence of operations, i.e. do the additions in an ordered sequence starting with the smallest values first. This will increase overall accuracy of the results using the same precision in the mantissa:

    1e00 + 1e-16 + ... + 1e-16 (1e16 times) = 1e00
    1e-16 + ... + 1e-16 (1e16 times) + 1e00 = 2e00
    

    The point is that adding small numbers to a large number will make them disappear. So the latter approach reduces the numerical error

    0 讨论(0)
  • 2020-12-31 15:56

    Floating point data types with greater precision than double are going to depend on your compiler and architecture.

    In order to get more than double precision, you may need to rely on some math library that supports arbitrary precision calculations. These probably won't be fast though.

    0 讨论(0)
  • 2020-12-31 15:59

    According to Wikipedia, 80-bit "Intel" IEEE 754 extended-precision long double, which is 80 bits padded to 16 bytes in memory, has 64 bits mantissa, with no implicit bit, which gets you 19.26 decimal digits. This has been the almost universal standard for long double for ages, but recently things have started to change.

    The newer 128-bit quad-precision format has 112 mantissa bits plus an implicit bit, which gets you 34 decimal digits. GCC implements this as the __float128 type and there is (if memory serves) a compiler option to set long double to it.

    0 讨论(0)
  • 2020-12-31 16:06

    On Intel architectures the precision of long double is 80bits.

    What kind of values do you want to represent? Maybe you are better off using fixed precision.

    0 讨论(0)
提交回复
热议问题