Decimal accuracy of binary floating point numbers

情到浓时终转凉″ 提交于 2019-12-21 20:55:30

问题


I've found this problem in many interview exams, but don't see how to work out the proper solution myself. The problem is:

How many digits of accuracy can be represented by a floating point number represented by two 16-bit words?

The solution is apparently approximately 6 digits.

Where does this come from, and how would you work it out?


回答1:


It's quite simple: a 32 bit IEEE-754 float has 23+1 bits for the mantissa (AKA significand, in IEEE-speak). The size of the mantissa more or less determines the number of representable digits.

To get the number of significant digits, simply calculate log10(224), which is approx. 7.22. (or, if you think that only 23 bits count, as the top bit is fixed anyway, you get log10(223), which is approx. 6.92). So in effect, you have about 6-7 significant digits, for normalized values.

The same can be done for 64 bit floating point values (doubles). They have 52 (or 53) bits to store the mantissa, so calculate log10(252), which is approx. 15.6 (or 15.9 for 53 bits), which gives you about 15 significant digits.




回答2:


An IEEE-754 floating point number has of a sign bit, some number of bits (e) for the exponent, and some number (m) for the mantissa, the number that's multiplied by 2 to the exponent. The resulting number is of the form

± m × 2e
eg, 0b1.01 × 2-0b0100 = 1.25 × 2-4 = .078125

which is directly analogous to a real number in (decimal) scientific notation,

± m × 10e
eg 7.8125 × 10-2.

Just as the number of significant digits in decimal is independent of the exponent part, so too in binary floating point the precision is set entirely by the number of bits in the mantissa. The longer the mantissa, the higher level of precision the number can represent. For a 32-bit floating point number, the IEEE-754 standard sets the number of bits in the mantissa to be 23 bits (+1 plus sign +8 for exponent); for a 64-bit floating point number, it is 52 bits (+1, +11).

Scientific notation has another convention; the mantissa must be between 1 (100) and 10 (101). That greatly simplifies making comparisons by making the representation in scientific notation unique - there is only one way of writing a number in scientific notation. That is, 200 is expressed as 2×102, not 20×101 or 0.2×103. Thus one can very quickly compare numbers - any number of the form +xyz×102 is necessarily less than one of the form +abc×103.

Similarly in binary, the mantissa must be between 1 and 2 (20...21). Thus the first bit of the mantissa must be 1; since it is known what its value must be, it doesn't need to be explicitly stored, and so you effectively have 24 and 53 bits of mantissa. (For very small numbers - denormalized numbers - the bit is implicitly a 0, not a 1, but the result is the same).

So as a result, the 24-bit mantissa in a 32-bit floating point number can range from

0b1.00000000000000000000001 ~ 1.00000012
     to
0b1.11111111111111111111111 ~ 1.99999988

that is, the smallest increment above 1 you can get or below 1 you can get is in the 7th decimal place. Another way of looking at it is to consider a numbers near the middle and look at what the spacing is:

0b1.01111111111111111111110 ~ 1.49999976
0b1.01111111111111111111111 ~ 1.49999988
0b1.10000000000000000000000 ~ 1.5
0b1.10000000000000000000001 ~ 1.50000012

so you get spacings of about 1.2 in the seventh decimal place - so you get something less than seven but more than six digits of precision. That spacing varies somewhat through the range of the numbers; also, one is rarely just doing one floating point operation, and these errors due to rounding propagate, so people usually talk about 6 digits of precision. It should be noted too that while the precision depends only on the size of the mantissa, how that "converts" to errors in decimal digits depends somewhat on the exponent as well; you can see that by taking some of those values and multiplying them by powers of two. But six digits of precision is a good rule of thumb.

The wikipedia page does a good job of giving an overview of floating point numbers, and the comprehensive reference is Goldberg's What Every Computer Scientist Should Know About Floating-Point Arithmetic.



来源:https://stackoverflow.com/questions/24377058/decimal-accuracy-of-binary-floating-point-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!