Normalization part of a code of Packing a Float (IEEE-754) into uint64_t

大憨熊 提交于 2019-12-25 01:14:22

问题


I have been researching about portable way to store a float in a binary format (in uint64_t), so that it can be shared over network to various microcontroller. It should be independent of float's memory layout and endianness of the system.

I came across this answer. However, I am unable to understand few lines in the code which are shown below:

while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
fnorm = fnorm - 1.0;

// calculate the binary form (non-float) of the significand data
significand = fnorm * ((1LL<<significandbits) + 0.5f);

I am aware that the code above tries to normalize the significand. The first line in the above code fragment is trying to get the exponent of the float. I am not sure why second, third and fourth line are necessary. I am able to understand that second and third line of code tries to make fnorm variable lie between 0.0 and 1.0 but why it is necesarry? Does having fnorm (in decimal format) between 0.0 and 1.0 makes sure it's binary representation will be 1.xxxxxx... .

Please help me understanding what each step is trying to achieve what and how it achieves that? I want to understand how it changes bit-pattern of the float variable to get normalized significant (left-most bit set to 1).


回答1:


The while loops adjust the exponent in order to place the first binary 1 of fnorm just before the dot (in base 2).
So at most fnorm is 1.1111111... in base 2, which is almost 2.0 in base 10.
At least fnorm is 1.000000... in base 2, which is 1.0 in base 10.

In IEEE754, the significand of a normalised number (not subnormal) has the form 1.xxxxxx... (base 2), which conforms to the previous loops.
The first bit, before the dot, is always 1 that's why it is not necessary to memorize it.
(may be this last remark is the main point of your question)

After normalisation, your algorithm substracts 1.0, which leads to 0.xxxxx... as you saw.
Substracting 1.0 does not lose any information as long as we remember this substraction is systematic.
Multiplying this float value (strictly less than 1.0, but not negative) by the integer 1LL<<significandbits gives a float which is strictly less than this big integer.
Thus, converting it into an integer will give a value that does not overflow the significant bits.
(I guess the 0.5 increment helps rounding the last bit)

This integer contains all the significant bits that were originally in the significand of the floating point value.
Knowing it, the shift, and the sign makes possible the reconstitution of the original floating point value.

But, as suggested in the comments, since IEEE754 bit pattern is well defined, all of this may not be necessary.



来源:https://stackoverflow.com/questions/57168435/normalization-part-of-a-code-of-packing-a-float-ieee-754-into-uint64-t

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!