Adding smallest possible float to a float

前端 未结 4 541
半阙折子戏
半阙折子戏 2020-12-23 15:55

I want to add the smallest possible value of a float to a float. So, for example, I tried doing this to get 1.0 + the smallest possible float:

float result =         


        
相关标签:
4条回答
  • 2020-12-23 16:34

    To increase/decrement a floating point value by the smallest possible amount, use nextafter towards +/- infinity().

    If you just use next_after(x,std::numeric_limits::max()), the result with be wrong in case x is infinity.

    #include <iostream>
    #include <limits>
    #include <cmath>
    
    template<typename T>
    T next_above(const T& v){
        return std::nextafter(v,std::numeric_limits<T>::infinity()) ;
    }
    template<typename T>
    T next_below(const T& v){
        return std::nextafter(v,-std::numeric_limits<T>::infinity()) ;
    }
    
    int main(){
      std::cout << "eps   : "<<std::numeric_limits<double>::epsilon()<< std::endl; // gives eps
    
      std::cout << "after : "<<next_above(1.0) - 1.0<< std::endl; // gives eps (the definition of eps)
      std::cout << "below : "<<next_below(1.0) - 1.0<< std::endl; // gives -eps/2
    
      // Note: this is what next_above does:
      std::cout << std::nextafter(std::numeric_limits<double>::infinity(),
         std::numeric_limits<double>::infinity()) << std::endl; // gives inf
    
      // while this is probably not what you need:
      std::cout << std::nextafter(std::numeric_limits<double>::infinity(),
         std::numeric_limits<double>::max()) << std::endl; // gives 1.79769e+308
    
    }
    
    0 讨论(0)
  • 2020-12-23 16:40

    If you want the next representable value after 1, there is a function for that called std::nextafter, from the <cmath> header.

    float result = std::nextafter(1.0f, 2.0f);
    

    It returns the next representable value starting from the first argument in the direction of the second argument. So if you wanted to find the next value below 1, you could do this:

    float result = std::nextafter(1.0f, 0.0f);
    

    Adding the smallest positive representable value to 1 doesn't work because the difference between 1 and the next representable value is greater than the difference between 0 and the next representable value.

    0 讨论(0)
  • 2020-12-23 16:42

    The "problem" you're observing is because of the very nature of floating point arithmetic.

    In FP the precision depends on the scale; around the value 1.0 the precision is not enough to be able to differentiate between 1.0 and 1.0+min_representable where min_representable is the smallest possible value greater than zero (even if we only consider the smallest normalized number, std::numeric_limits<float>::min()... the smallest denormal is another few orders of magnitude smaller).

    For example with double-precision 64-bit IEEE754 floating point numbers, around the scale of x=10000000000000000 (1016) it's impossible to distinguish between x and x+1.


    The fact that the resolution changes with scale is the very reason for the name "floating point", because the decimal point "floats". A fixed point representation instead will have a fixed resolution (for example with 16 binary digits below units you have a precision of 1/65536 ~ 0.00001).

    For example in the IEEE754 32-bit floating point format there is one bit for the sign, 8 bits for the exponent and 31 bits for the mantissa:


    The smallest value eps such that 1.0f + eps != 1.0f is available as a pre-defined constant as FLT_EPSILON, or std::numeric_limits<float>::epsilon. See also machine epsilon on Wikipedia, which discusses how epsilon relates to rounding errors.

    I.e. epsilon is the smallest value that does what you were expecting here, making a difference when added to 1.0.

    The more general version of this (for numbers other than 1.0) is called 1 unit in the last place (of the mantissa). See Wikipedia's ULP article.

    0 讨论(0)
  • 2020-12-23 16:50

    min is the smallest non-zero value that a (normalized-form) float can assume, i.e. something around 2-126 (-126 is the minimum allowed exponent for a float); now, if you sum it to 1 you'll still get 1, since a float has just 23 bits of mantissa, so such a small change cannot be represented in such a "big" number (you would need a 126 bit mantissa to see a change summing 2-126 to 1).

    The minimum possible change to 1, instead, is epsilon (the so-called machine epsilon), which is in fact 2-23 - as it affects the last bit of the mantissa.

    0 讨论(0)
提交回复
热议问题