How to calculate float type precision and does it make sense?

风流意气都作罢 提交于 2020-05-22 10:04:27

问题


I have a problem understanding the precision of float type. The msdn writes that precision from 6 to 9 digits. But I note that precision depends from on the size of the number:

  float smallNumber = 1.0000001f;
  Console.WriteLine(smallNumber); // 1.0000001

  bigNumber = 100000001f;
  Console.WriteLine(bigNumber); // 100000000

The smallNumber is more precise than big, I understand IEEE754, but I don't understand how MSDN calculate precision, and does it make sense?

Also, you can play with the representation of numbers in float format here. Please write 100000000 value in "You entered" input and click "+1" on the right. Then change the input's value to 1, and click "+1" again. You may see the difference in precision.


回答1:


The MSDN documentation is nonsensical and wrong.

Bad concept. Binary-floating-point format does not have any precision in decimal digits because it has no decimal digits at all. It represents numbers with a sign, a fixed number of binary digits (bits), and an exponent for a power of two.

Wrong on the high end. The floating-point format represents many numbers exactly, with infinite precision. For example, “3” is represented exactly. You can write it in decimal arbitrarily far, 3.0000000000…, and all of the decimal digits will be correct. Another example is 1.40129846432481707092372958328991613128026194187651577175706828388979108268586060148663818836212158203125e-45. This number has 105 significant digits in decimal, but the float format represents it exactly (it is 2−149).

Wrong on the low end.* When “999999.97” is converted from decimal to float, the result is 1,000,000. So not even one decimal digit is correct.

Not a measure of accuracy. Because the float significand has 24 bits, the resolution of its lowest bit is about 223 times finer than the resolution of its highest bit. This is about 6.9 digits in the sense that log10223 is about 6.9. But that just tells us the resolution—the coarseness—of the representation. When we convert a number to the float format, we get a result that differs from the number by at most ½ of this resolution, because we round to the nearest representable value. So a conversion to float has a relative error of at most 1 part in 224, which corresponds to about 7.2 digits in the above sense.

So, if “~6-9 digits” is not a correct concepts, does not come from actual bounds on the digits, and does not measure accuracy, where does it come from? We cannot be sure, but 6 and 9 do appear in two descriptions of the float format.

6 is the largest number x for which this is guaranteed:

  • If any decimal numeral with at most x significant digits is within the finite bounds of the float format and is converted to the nearest value represented in the format, then, when the result is converted to the nearest decimal numeral with at most x significant digits, the result of that conversion equals the original number.

9 is the smallest number x that guarantees this:

  • If any finite float number is converted to the nearest decimal numeral with x digits, then, when the result is converted to the nearest value representable in float, the result of that conversion equals the original number.

As an analogy, if float is a container, then the largest “decimal container” guaranteed to fit inside it is six digits, and the smallest “decimal container” guaranteed to hold it is nine digits. 6 and 9 are akin to interior and exterior measurements of the float container.

Suppose you had a block 7.2 units long, and you were looking at its placement on a line of bricks each 1 unit long. If you put the start of the block at the start of a brick, it will extend 7.2 bricks. However, somebody else chooses where it starts, they might start it in the middle of a brick. Then it would cover part of that brick, all of the next 6 bricks, and and part of the last brick (e.g., .5 + 6 + .7 = 7.2). So a 7.2-unit block is only guaranteed to cover 6 bricks. Conversely, 8 bricks can covert the 7.2-unit block if you choose where they are placed. But if somebody else chooses where they start, the first might covert just .1 units of the block. Then you need 7 more and another fraction, so 9 bricks are needed.

The reason this analogy holds is that powers of two and powers of 10 are irregularly spaced relative to each other. 210 (1024) is near 103 (1000). 10 is the exponent used in the float format for numbers from 1024 (inclusive) to 2048 (exclusive). So this interval from 1024 to 2048 is like a block that has been placed just after the 100-1000 ends and the 1000-10,000 block starts.

For better understanding of floating-point arithmetic, consider studying the IEEE-754 Standard for Floating-Point Arithmetic or a good textbook like Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al.




回答2:


Yes number of digits before rounding errors is a measure of precision but you can not asses precision from just 2 numbers because you might be just closer or further from the rounding threshold.

To better understand the situation then you need to see how floats are represented.

The IEEE754 32bit floats are stored as:

bool(1bit sign) * integer(24bit mantisa) << integer(8bit exponent)

Yes mantissa is 24 bit instead of 23 as it's MSB is implicitly set to 1.

As you can see there are only integers and bitshift. So if you are representing natural number up to 2^24 you are without rounding completely. Fro bigger numbers binary zero padding occurs from the right that causes the difference.

In case of digits after decimal points the zero padding occurs from the left. But there is another problem as in binary you can not store some decadic numbers exactly. For example:

0.3 dec = 0.100110011001100110011001100110011001100... bin
0.25 dec = 0.01 bin

As you can see the sequence of 0.3 dec in binary is infinite (like we can not write 1/3 in decadic) hence if crop it to only 24 bits you lose the rest and the number is not what you want anymore.

If you compare 0.3 and 0.125 the 0.125 is exact and 0.3 is not but 0.125 is much smaller than 0.3. So your measure is not correct unless explored more very close values that will cover the rounding steps and computing the max difference from such set. For example you could compare

1.0000001f
1.0000002f
1.0000003f
1.0000004f
1.0000005f
1.0000006f
1.0000007f
1.0000008f
1.0000009f

and remember the max difference of fabs(x-round(x)) and than do the same for

100000001
100000002
100000003
100000004
100000005
100000006
100000007
100000008
100000009

And then compare the two differences.

On top of all this you are missing one very important thing. And that is the errors while converting from text to binary and back which are usually even bigger. First of all try to print your numbers without rounding (for example force to print 20 decimal digits after decimal point).

Also the numbers are stored in binary base so in order to print them you need to convert to decadic base which involves multiplication and division by 10. The more bits are missing (zero pad) from the number the bigger the print errors are. To be as precise as you can a trick is used and that is to print the number in hex (no rounding errors) and then convert the hex string itself to decadic base on integer math. That is much more accurate then naive floating point prints. for more info see related QAs:

  • my best attempt to print 32 bit floats with least rounding errors (integer math only)
  • How do libraries/programming languages convert floats to strings
  • How do I convert a very long binary number to decimal?

Now to get back to number of "precise" digits represented by float. For integer part of number is that easy:

dec_digits = floor(log10(2^24)) = floor(7.22) = 7

However for digits after decimal point is this not as precise (for first few decadic digits) as there are a lot rounding going on. For more info see:

  • How do you print the EXACT value of a floating point number?



回答3:


I think what they mean in their documentation is that depending on the number that the precision ranges from 6 to 9 decimal places. Go by the standard that is explained on the page you linked, sometimes Microsoft are a bit lazy when it comes to documentation, like the rest of us. The problem with floating point is that it is inaccurate. If you put the number 1.05 into the site in your link you will notice that it cannot be accurately stored in floating point. It's actually stored as 1.0499999523162841796875. It's stored this way to do calculations faster. It's not great for money, e.g. what if your item is priced at $1.05 and you sell a billion of them.




回答4:


The smallNumber is more precise than big

Incorrect compare. The other number has more significant digits.

1.0000001f is attempting N digits of decimal precision.
100000001f attempts N+1.

I have a problem understanding the precision of float type.

To best understand float precision, think binary. Use "%a" for printing with a C99 or later compiler.

float is stored base 2. The significand is a Dyadic rational, some integer/power-of-2.

float commonly has 24 bits of binary precision. (23-bit explicitly encoded, 1 implied)

Between [1.0 ... 2.0), there are 223 different float values.
Between [2.0 ... 4.0), there are 223 different float values.
Between [4.0 ... 8.0), there are 223 different float values.
...

The possible values of a float are not distributed uniformly among powers-of-10. The grouping of float values to power-of-10 (decimal precision) results in the wobbling 6 to 9 decimal digits of precision.


How to calculate float type precision?

To find the difference between subsequent float values, since C99, use nextafterf()

Illustrative code:

#include<math.h>
#include<stdio.h>

void foooo(float b) {
  float a = nextafterf(b, 0);
  float c = nextafterf(b, b * 2.0f);
  printf("%-15a %.9e\n", a, a);
  printf("%-15a %.9e\n", b, b);
  printf("%-15a %.9e\n", c, c);
  printf("Local decimal precision %.2f digits\n", 1.0 - log10((c - b) / b));
}

int main(void) {
  foooo(1.0000001f);
  foooo(100000001.0f);
  return 0;
}

Output

0x1p+0          1.000000000e+00
0x1.000002p+0   1.000000119e+00
0x1.000004p+0   1.000000238e+00
Local decimal precision 7.92 digits
0x1.7d783ep+26  9.999999200e+07
0x1.7d784p+26   1.000000000e+08
0x1.7d7842p+26  1.000000080e+08
Local decimal precision 8.10 digits


来源:https://stackoverflow.com/questions/61609276/how-to-calculate-float-type-precision-and-does-it-make-sense

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!