Floating Point Algorithms in C

☆樱花仙子☆ 提交于 2019-12-12 11:35:07

问题


I am thinking recently on how floating point math works on computers and is hard for me understand all the tecnicals details behind the formulas. I would need to understand the basics of addition, subtraction, multiplication, division and remainder. With these I will be able to make trig functions and formulas.

I can guess something about it, but its a bit unclear. I know that a fixed point can be made by separating a 4 byte integer by a signal flag, a radix and a mantissa. With this we have a 1 bit flag, a 5 bits radix and a 10 bit mantissa. A word of 32 bits is perfect for a floating point value :)

To make an addition between two floats, I can simply try to add the two mantissas and add the carry to the 5 bits radix? This is a way to do floating point math (or fixed point math, to be true) or I am completely wrong?

All the explanations I saw use formulas, multiplications, etc. and they look so complex for a thing I guess, would be a bit more simple. I would need an explanation more directed to beginning programmers and less to mathematicians.


回答1:


The radix depends of the representation, if you use radix r=2 you can never change it, the number doesn't even have any data that tell you which radix have. I think you're wrong and you mean exponent.

To add two numbers in floating point you must make the exponent one equal to another by rotating the mantissa. One bit right means exponent+1, and one bit left means exponent -1, when you have the numbers with the same exponent then you can add them.

Value(x) = mantissa * radix ^ exponent

adding these two numbers

    101011 * 2 ^ 13
    001011 * 2 ^ 12

would be the same as adding:

    101011 * 2 ^ 13
    000101 * 2 ^ 13

After making exponent equal one to another you can operate. You also have to know if the representation has implicit bit, I mean, the most significant bit must be a 1, so usually, as in the iee standard its known to be there, but it isn't representated, although its used to operate.

I know this can be a bit confusing and I'm not the best teacher so any doubt you have, just ask.




回答2:


See Anatomy of a floating point number




回答3:


Run, don't walk, to get Knuth's Seminumerical Algorithms which contains wonderful intuition and algorithms behind doing multiprecision and floating point arithmetic.



来源:https://stackoverflow.com/questions/3199166/floating-point-algorithms-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!