How to manually parse a floating point number from a string

前端 未结 11 1429
攒了一身酷
攒了一身酷 2020-12-02 15:35

Of course most languages have library functions for this, but suppose I want to do it myself.

Suppose that the float is given like in a C or Java program (except for

相关标签:
11条回答
  • 2020-12-02 16:20

    It is not possible to convert any arbitrary string representing a number into a double or float without losing precision. There are many fractional numbers that can be represented exactly in decimal (e.g. "0.1") that can only be approximated in a binary float or double. This is similar to how the fraction 1/3 cannot be represented exactly in decimal, you can only write 0.333333...

    If you don't want to use a library function directly why not look at the source code for those library functions? You mentioned Java; most JDKs ship with source code for the class libraries so you could look up how the java.lang.Double.parseDouble(String) method works. Of course something like BigDecimal is better for controlling precision and rounding modes but you said it needs to be a float or double.

    0 讨论(0)
  • 2020-12-02 16:22

    You could ignore the decimal when parsing (except for its location). Say the input was: 156.7834e10... This could easily be parsed into the integer 1567834 followed by e10, which you'd then modify to e6, since the decimal was 4 digits from the end of the "numeral" portion of the float.

    Precision is an issue. You'll need to check the IEEE spec of the language you're using. If the number of bits in the Mantissa (or Fraction) is larger than the number of bits in your Integer type, then you'll possibly lose precision when someone types in a number such as:

    5123.123123e0 - converts to 5123123123 in our method, which does NOT fit in an Integer, but the bits for 5.123123123 may fit in the mantissa of the float spec.

    Of course, you could use a method that takes each digit in front of the decimal, multiplies the current total (in a float) by 10, then adds the new digit. For digits after the decimal, multiply the digit by a growing power of 10 before adding to the current total. This method seems to beg the question of why you're doing this at all, however, as it requires the use of the floating point primitive without using the readily available parsing libraries.

    Anyway, good luck!

    0 讨论(0)
  • 2020-12-02 16:22

    My first thought is to parse the string into an int64 mantissa and an int decimal exponent using only the first 18 digits of the mantissa. For example, 1.2345e-5 would be parsed into 12345 and -9. Then I would keep multiplying the mantissa by 10 and decrementing the exponent until the mantissa was 18 digits long (>56 bits of precision). Then I would look the decimal exponent up in a table to find a factor and binary exponent that can be used to convert the number from decimal n*10^m to binary p*2^q form. The factor would be another int64 so I'd multiply the mantissa by it such that I obtained the top 64-bits of the resulting 128-bit number. This int64 mantissa can be cast to a float losing only the necessary precision and the 2^q exponent can be applied using multiplication with no loss of precision.

    I'd expect this to be very accurate and very fast but you may also want to handle the special numbers NaN, -infinity, -0.0 and infinity. I haven't thought about the denormalized numbers or rounding modes.

    0 讨论(0)
  • 2020-12-02 16:22

    For that you have to understand the standard IEEE 754 in order for proper binary representation. After that you can use Float.intBitsToFloat or Double.longBitsToDouble.

    http://en.wikipedia.org/wiki/IEEE_754

    0 讨论(0)
  • 2020-12-02 16:23

    All of the other answers have missed how hard it is to do this properly. You can do a first cut approach at this which is accurate to a certain extent, but until you take into account IEEE rounding modes (et al), you will never have the right answer. I've written naive implementations before with a rather large amount of error.

    If you're not scared of math, I highly recommend reading the following article by David Goldberg, What Every Computer Scientist Should Know About Floating-Point Arithmetic. You'll get a better understanding for what is going on under the hood, and why the bits are laid out as such.

    My best advice is to start with a working atoi implementation, and move out from there. You'll rapidly find you're missing things, but a few looks at strtod's source and you'll be on the right path (which is a long, long path). Eventually you'll praise insert diety here that there are standard libraries.

    /* use this to start your atof implementation */
    
    /* atoi - christopher.watford@gmail.com */
    /* PUBLIC DOMAIN */
    long atoi(const char *value) {
      unsigned long ival = 0, c, n = 1, i = 0, oval;
      for( ; c = value[i]; ++i) /* chomp leading spaces */
        if(!isspace(c)) break;
      if(c == '-' || c == '+') { /* chomp sign */
        n = (c != '-' ? n : -1);
        i++;
      }
      while(c = value[i++]) { /* parse number */
        if(!isdigit(c)) return 0;
        ival = (ival * 10) + (c - '0'); /* mult/accum */
        if((n > 0 && ival > LONG_MAX)
        || (n < 0 && ival > (LONG_MAX + 1UL))) {
          /* report overflow/underflow */
          errno = ERANGE;
          return (n > 0 ? LONG_MAX : LONG_MIN);
        }
      }
      return (n>0 ? (long)ival : -(long)ival);
    }
    
    0 讨论(0)
提交回复
热议问题