问题
I am thinking recently on how floating point math works on computers and is hard for me understand all the tecnicals details behind the formulas. I would need to understand the basics of addition, subtraction, multiplication, division and remainder. With these I will be able to make trig functions and formulas.
I can guess something about it, but its a bit unclear. I know that a fixed point can be made by separating a 4 byte integer by a signal flag, a radix and a mantissa. With this we have a 1 bit flag, a 5 bits radix and a 10 bit mantissa. A word of 32 bits is perfect for a floating point value :)
To make an addition between two floats, I can simply try to add the two mantissas and add the carry to the 5 bits radix? This is a way to do floating point math (or fixed point math, to be true) or I am completely wrong?
All the explanations I saw use formulas, multiplications, etc. and they look so complex for a thing I guess, would be a bit more simple. I would need an explanation more directed to beginning programmers and less to mathematicians.
回答1:
The radix depends of the representation, if you use radix r=2 you can never change it, the number doesn't even have any data that tell you which radix have. I think you're wrong and you mean exponent.
To add two numbers in floating point you must make the exponent one equal to another by rotating the mantissa. One bit right means exponent+1, and one bit left means exponent -1, when you have the numbers with the same exponent then you can add them.
Value(x) = mantissa * radix ^ exponent
adding these two numbers
101011 * 2 ^ 13
001011 * 2 ^ 12
would be the same as adding:
101011 * 2 ^ 13
000101 * 2 ^ 13
After making exponent equal one to another you can operate. You also have to know if the representation has implicit bit, I mean, the most significant bit must be a 1, so usually, as in the iee standard its known to be there, but it isn't representated, although its used to operate.
I know this can be a bit confusing and I'm not the best teacher so any doubt you have, just ask.
回答2:
See Anatomy of a floating point number
回答3:
Run, don't walk, to get Knuth's Seminumerical Algorithms which contains wonderful intuition and algorithms behind doing multiprecision and floating point arithmetic.
来源:https://stackoverflow.com/questions/3199166/floating-point-algorithms-in-c