floating point issue

前端未结

关注

 4  1839

醉梦人生

I have a floating value as 0.1 entering from UI.

But, while converting that string to float i am getting as 0.10...01. The problem is the appending of non zero dig

相关标签:

4条回答

执笔经年

2020-12-12 07:46

Since floats get stored in binary, the fractional portion is effectively in base-two... and one-tenth is a repeating decimal in base two, same as one-ninth is in base ten.

The most common ways to deal with this are to store your values as appropriately-scaled integers, as in the C# or SQL currency types, or to round off floating-point numbers when you display them.

0 讨论(0)
发布评论:

提交评论
- 加载中...
情话喂你

2020-12-12 07:51

0.1 (decimal) = 0.00011001100110011... (binary)

So, in general, a number you can represent with a finite number of decimal digits may not be representable with a finite number of bits. But floating point numbers only store the most N significant bits. So, conversions between a decimal string and a "binary" float usually involves rounding.

However a lossless roundtrip conversion decimal string -> double -> decimal string is possible if you restrict yourself to decimal strings with at most 15 significant digits (assuming IEEE 754 64 bit floats). This includes the last conversion. You need to produce a string from the double with at most 15 significant digits.

It is also possible to make the roundtrip double -> string -> double lossless. But here you may need decimal strings with 17 decimal digits to make it work (again assuming IEEE-754 64bit floats).

0 讨论(0)
发布评论:

提交评论
- 加载中...
Happy的楠姐

2020-12-12 07:52
The best site I've ever seen that explains why some numbers can't be represented exactly is Harald Schmidt's IEEE754 Converter site.

It's an online tool for showing representations of IEEE754 single precision values and I liked it so much, I wrote my own Java app to do it (and double precision as well).

Bottom line, there are only about four billion different 32-bit values you can have but there are an infinite number of real values between any two different values. So you have a problem with precision. That's something you'll have to get used to.

If you want more precision and/or better type for decimal values, you can either:
- switch to a higher number of bits.
- use a decimal type
- use a big-number library like GMP (although I refuse to use this in production code since I discovered it doesn't handle memory shortages elegantly).
Alternatively, you can use the inaccurate values (their error rates are very low, something like one part per hundred million for floats, from memory) and just print them out with less precision. Printing out 0.10000000145 to two decimal places will get you 0.10.

You would have to do millions and millions of additions for the error to accumulate noticeably. Less of other operations of course but still a lot.

As to why you're getting that value, 0.1 is stored in IEEE754 single precision format as follows (sign, exponent and mantissa):
```
s eeeeeeee mmmmmmmmmmmmmmmmmmmmmmm    1/n
0 01111011 10011001100110011001101
           ||||||||||||||||||||||+- 8388608
           |||||||||||||||||||||+-- 4194304
           ||||||||||||||||||||+--- 2097152
           |||||||||||||||||||+---- 1048576
           ||||||||||||||||||+-----  524288
           |||||||||||||||||+------  262144
           ||||||||||||||||+-------  131072
           |||||||||||||||+--------   65536
           ||||||||||||||+---------   32768
           |||||||||||||+----------   16384
           ||||||||||||+-----------    8192
           |||||||||||+------------    4096
           ||||||||||+-------------    2048
           |||||||||+--------------    1024
           ||||||||+---------------     512
           |||||||+----------------     256
           ||||||+-----------------     128
           |||||+------------------      64
           ||||+-------------------      32
           |||+--------------------      16
           ||+---------------------       8
           |+----------------------       4
           +-----------------------       2
```
The sign is positive, that's pretty easy.

The exponent is 64+32+16+8+2+1 = 123 - 127 bias = -4, so the multiplier is 2^-4 or 1/16.

The mantissa is chunky. It consists of 1 (the implicit base) plus (for all those bits with each being worth 1/(2ⁿ) as n starts at 1 and increases to the right), {1/2, 1/16, 1/32, 1/256, 1/512, 1/4096, 1/8192, 1/65536, 1/131072, 1/1048576, 1/2097152, 1/8388608}.

When you add all these up, you get 1.60000002384185791015625.

When you multiply that by the multiplier, you get 0.100000001490116119384765625, matching the double precision value on Harald's site as far as it's printed:
```
0.10000000149011612 (out by 0.00000000149011612)
```
And when you turn off the least significant (rightmost) bit, which is the smallest downward movement you can make, you get:
```
0.09999999403953552 (out by 0.00000000596046448)
```
Putting those two together:
```
0.10000000149011612 (out by 0.00000000149011612)
                                      |
0.09999999403953552 (out by 0.00000000596046448)
```
you can see that the first one is a closer match, by about a factor of four (14.9:59.6). So that's the closest value you can get to 0.1.
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-12-12 08:02
You need to do some background reading on floating point representations: http://docs.sun.com/source/806-3568/ncg_goldberg.html.

Given computers are on-off switches, they're storing a rounded answer, and they work in base two not the base ten we humans seem to like.

Your options are to:
- display it back with less digits so you round back to base 10 (checkout the Standard library's <iomanip> header, and setprecision)
- store the number in some actual decimal-capable object - you'll find plenty of C++ classes to do this via google, but none are provided in the Standard, nor in boost last I looked
- convert the input from a string directly to an integral number of some smaller unit (like thousandths), avoiding the rounding.
0 讨论(0)
发布评论:

提交评论
- 加载中...