float strange imprecision error in c [duplicate]

This question already has an answer here:

Is floating point math broken? 31 answers

today happened to me a strange thing, when I try to compile and execute the output of this code isn't what I expected. Here is the code that simply add floating values to an array of float and then print it out. The simple code:

int main(){
    float r[10];
    int z;
    int i=34;
    for(z=0;z<10;z++){
        i=z*z*z;
        r[z]=i;
        r[z]=r[z]+0.634;
        printf("%f\n",r[z]);
    }
}

the output:

note that from the 27 appears numbers after the .634 that should not be there. Anyone know why this happened? It's an event caused by floating point approximation?..

P.S I have a linux debian system, 64 bit

thanks all

A number maybe represented in the following form:

[sign] [mantissa] * 2^[exponent]

So there will be rounding or relative errors when the space is less in memory.

From wiki:

Single-precision floating-point format is a computer number format that occupies 4 bytes (32 bits) in computer memory and represents a wide dynamic range of values by using a floating point.

The IEEE 754 standard specifies a binary32 as having:

Sign bit: 1 bit
Exponent width: 8 bits
Significand precision: 24 bits (23 explicitly stored)

This gives from 6 to 9 significant decimal digits precision (if a decimal string with at most 6 significant decimal is converted to IEEE 754 single precision and then converted back to the same number of significant decimal, then the final string should match the original; and if an IEEE 754 single precision is converted to a decimal string with at least 9 significant decimal and then converted back to single, then the final number must match the original [4]).

Edit (Edward's comment): Larger (more bits) floating point representations allow for greater precision.

Yes, this is a floating point approximation error or Round-off error. Floating point numbers representation uses quantization to represent a large range of numbers, so it only represent steps and round all the in-between numbers to the nearest step. This cause error if the wanted number is not one of these steps.

In addition to the other useful answers, it can be illustrative to print more digits than the default:

int main(){
    float r[10];
    int z;
    int i=34;
    for(z=0;z<10;z++){
        i=z*z*z;
        r[z]=i;
        r[z]=r[z]+0.634;
        printf("%.30f\n",r[z]);
    }
}

gives

0.634000003337860107421875000000
1.633999943733215332031250000000
8.633999824523925781250000000000
27.634000778198242187500000000000
64.634002685546875000000000000000
125.634002685546875000000000000000
216.634002685546875000000000000000
343.634002685546875000000000000000
512.633972167968750000000000000000
729.633972167968750000000000000000

In particular, note that 0.634 isn't actually "0.634", but instead is the closest number representable by a float.

"float" has only about six digit precision, so it isn't unexpected that you get errors that large.

If you used "double", you would have about 15 digits precision. You would have an error, but you would get for example 125.634000000000003 and not 125.634003.

So you will always get rounding errors and your results will not be quite what you expect, but by using double the effect will be minimal. Warning: If you do things like adding 125 + 0.634 and then subtract 125, the result will (most likely) not be 0.634. No matter whether you use float or double. But with double, the result will be very, very close to 0.634.

In principle, given the choice of float and double, you should never use float, unless you have a very, very good reason.

来源：https://stackoverflow.com/questions/27621542/float-strange-imprecision-error-in-c

标签

floating-point

floating-accuracy

approximation