Practices to limit floating point accuracy problems

后端未结

关注

 2  1749

As programmers, most (if not all of us) know that floating point numbers have a tendency to not be very accurate. I know that this problem can\'t be avoided entirely, but I am w

相关标签:

2条回答

时光说笑

2021-01-22 10:42

Use fixed-point mathematics where you can deal with a known limited precision.

As an example, the Rockbox music player firmware uses almost entirely fixed-point media codecs.

If you must be perfectly accurate, use an infinite-length storage type like those provided by the GMP library.

If you're just trying to cut down on your errors, try to work as close to zero as possible, where the IEEE FP numbers are more precise. Reorder your operations to avoid letting your absolute values get too large.

0 讨论(0)
发布评论:

提交评论
- 加载中...
猫巷女王i

2021-01-22 10:57
Floating point accuracy is a large subject, and some of the brightest computer scientists have been working on this issue for many years. If you either haven't studied fp accuracy, haven't thoroughly studied your cs problem, or can't rely on other teammates to fully understand, just stick with doubles, rather than 32-but floats, unless you're just doing computer graphics or the project calls for singles.

Some tasks, like multiplication, are communicative. For example, using Python:
```
>>>a*a*a*a*a*a    
1.1044776737696922    
>>> (a*a*a)*(a*a*a)    
1.104477673769692    
>>> (a*a)*(a*a)*(a*a)   
1.104477673769692
```
The answer comes out the same because the exponents are simply added together, while the mantissa (1.fraction...) are simply multiplied with no loss.

On the other hand, if we perform subtraction and multiplication in the wrong order, we can get very different results.

b = 1.00016789

b*(b-1)

0.00016791818705204833

b*b - b

0.00016791818705197414

Even though this looks fine, if you look closely, you'll see only 11 decimal digits are correct. To view it another way, ((b*(b-1)) - (b*b-b))/b should be zero, algebraically, but it comes out to 7.417408056593443e-17. That may seem like a small error, but floating point error tends to add up in a negative way. Had we used single precision float b = 1.00016789, using C syntax, the problems would be much worse. You would then have only a few reliable decimal digits left after such a small set of operations.
0 讨论(0)
发布评论:

提交评论
- 加载中...