More Precise Floating point Data Types than double?

前端未结

关注

 4  1914

In my project I have to compute division, multiplication, subtraction, addition on a matrix of double elements. The problem is that when the size of matrix incr

相关标签:

4条回答

时光说笑

2020-12-31 15:52
You might want to consider the sequence of operations, i.e. do the additions in an ordered sequence starting with the smallest values first. This will increase overall accuracy of the results using the same precision in the mantissa:
```
1e00 + 1e-16 + ... + 1e-16 (1e16 times) = 1e00
1e-16 + ... + 1e-16 (1e16 times) + 1e00 = 2e00
```
The point is that adding small numbers to a large number will make them disappear. So the latter approach reduces the numerical error
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-31 15:56

Floating point data types with greater precision than double are going to depend on your compiler and architecture.

In order to get more than double precision, you may need to rely on some math library that supports arbitrary precision calculations. These probably won't be fast though.

0 讨论(0)
发布评论:

提交评论
- 加载中...
名媛妹妹

2020-12-31 15:59

According to Wikipedia, 80-bit "Intel" IEEE 754 extended-precision long double, which is 80 bits padded to 16 bytes in memory, has 64 bits mantissa, with no implicit bit, which gets you 19.26 decimal digits. This has been the almost universal standard for long double for ages, but recently things have started to change.

The newer 128-bit quad-precision format has 112 mantissa bits plus an implicit bit, which gets you 34 decimal digits. GCC implements this as the __float128 type and there is (if memory serves) a compiler option to set long double to it.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2020-12-31 16:06

On Intel architectures the precision of long double is 80bits.

What kind of values do you want to represent? Maybe you are better off using fixed precision.

0 讨论(0)
发布评论:

提交评论
- 加载中...