Of course most languages have library functions for this, but suppose I want to do it myself.
Suppose that the float is given like in a C or Java program (except for
Using a state machine. It's fairly easy to do, and even works if the data stream is interrupted (you just have to keep the state and the partial result). You can also use a parser generator (if you're doing something more complex).
I would directly assemble the floating point number using its binary representation.
Read in the number one character after another and first find all digits. Do that in integer arithmetic. Also keep track of the decimal point and the exponent. This one will be important later.
Now you can assemble your floating point number. The first thing to do is to scan the integer representation of the digits for the first set one-bit (highest to lowest).
The bits immediately following the first one-bit are your mantissa.
Getting the exponent isn't hard either. You know the first one-bit position, the position of the decimal point and the optional exponent from the scientific notation. Combine them and add the floating point exponent bias (I think it's 127, but check some reference please).
This exponent should be somewhere in the range of 0 to 255. If it's larger or smaller you have a positive or negative infinite number (special case).
Store the exponent as it into the bits 24 to 30 of your float.
The most significant bit is simply the sign. One means negative, zero means positive.
It's harder to describe than it really is, try to decompose a floating point number and take a look at the exponent and mantissa and you'll see how easy it really is.
Btw - doing the arithmetic in floating point itself is a bad idea because you will always force your mantissa to be truncated to 23 significant bits. You won't get a exact representation that way.
Yes, you can decompose the construction into floating point operations as long as these operations are EXACT, and you can afford a single final inexact operation.
Unfortunately, floating point operations soon become inexact, when you exceed precision of mantissa, the results are rounded. Once a rounding "error" is introduced, it will be cumulated in further operations...
So, generally, NO, you can't use such naive algorithm to convert arbitrary decimals, this may lead to an incorrectly rounded number, off by several ulp of the correct one, like others have already told you.
BUT LET'S SEE HOW FAR WE CAN GO:
If you carefully reconstruct the float like this:
if(biasedExponent >= 0)
return integerMantissa * (10^biasedExponent);
else
return integerMantissa / (10^(-biasedExponent));
there is a risk to exceed precision both when cumulating the integerMantissa if it has many digits, and when raising 10 to the power of biasedExponent...
Fortunately, if first two operations are exact, then you can afford a final inexact operation * or /, thanks to IEEE properties, the result will be rounded correctly.
Let's apply this to single precision floats which have a precision of 24 bits.
10^8 > 2^24 > 10^7
Noting that multiple of 2 will only increase the exponent and leave the mantissa unchanged, we only have to deal with powers of 5 for exponentiation of 10:
5^11 > 2^24 > 5^10
Though, you can afford 7 digits of precision in the integerMantissa and a biasedExponent between -10 and 10.
In double precision, 53 bits,
10^16 > 2^53 > 10^15
5^23 > 2^53 > 5^22
So you can afford 15 decimal digits, and a biased exponent between -22 and 22.
It's up to you to see if your numbers will always fall in the correct range... (If you are really tricky, you could arrange to balance mantissa and exponent by inserting/removing trailing zeroes).
Otherwise, you'll have to use some extended precision.
If your language provides arbitrary precision integers, then it's a bit tricky to get it right, but not that difficult, I did this in Smalltalk and blogged about it at http://smallissimo.blogspot.fr/2011/09/clarifying-and-optimizing.html and http://smallissimo.blogspot.fr/2011/09/reviewing-fraction-asfloat.html
Note that these are simple and naive implementations. Fortunately, libc is more optimized.
If you want the most precise result possible, you should use a higher internal working precision, and then downconvert the result to the desired precision. If you don't mind a few ULPs of error, then you can just repeatedly multiply by 10 as necessary with the desired precision. I would avoid the pow() function, since it will produce inexact results for large exponents.
The "standard" algorithm for converting a decimal number to the best floating-point approximation is William Clinger's How to read floating point numbers accurately, downloadable from here. Note that doing this correctly requires multiple-precision integers, at least a certain percentage of the time, in order to handle corner cases.
Algorithms for going the other way, printing the best decimal number from a floating-number, are found in Burger and Dybvig's Printing Floating-Point Numbers Quickly and Accurately, downloadable here. This also requires multiple-precision integer arithmetic
See also David M Gay's Correctly Rounded Binary-Decimal and Decimal-Binary Conversions for algorithms going both ways.
I agree with terminus. A state machine is the best way to accomplish this task as there are many stupid ways a parser can be broken. I am working on one now, I think it is complete and it has I think 13 states.
The problem is not trivial.
I am a hardware engineer interested designing floating point hardware. I am on my second implementation.
I found this today http://speleotrove.com/decimal/decarith.pdf
which on page 18 gives some interesting test cases.
Yes, I have read Clinger's article, but being a simple minded hardware engineer, I can't get my mind around the code presented. The reference to Steele's algorithm as asnwered in Knuth's text was helpful to me. Both input and output are problematic.
All of the aforementioned references to various articles are excellent.
I have yet to sign up here just yet, but when I do, assuming the login is not taken, it will be broh. (broh-dot).
Clyde