问题
I have been researching about portable way to store a float
in a binary format (in uint64_t), so that it can be shared over network to various microcontroller. It should be independent of float's memory layout
and endianness
of the system.
I came across this answer. However, I am unable to understand few lines in the code which are shown below:
while(fnorm >= 2.0) { fnorm /= 2.0; shift++; }
while(fnorm < 1.0) { fnorm *= 2.0; shift--; }
fnorm = fnorm - 1.0;
// calculate the binary form (non-float) of the significand data
significand = fnorm * ((1LL<<significandbits) + 0.5f);
I am aware that the code above tries to normalize the significand
. The first line in the above code fragment is trying to get the exponent
of the float. I am not sure why second, third and fourth line are necessary. I am able to understand that second and third line of code tries to make fnorm
variable lie between 0.0
and 1.0
but why it is necesarry? Does having fnorm (in decimal format) between 0.0
and 1.0
makes sure it's binary representation will be 1.xxxxxx...
.
Please help me understanding what each step is trying to achieve what and how it achieves that? I want to understand how it changes bit-pattern of the float variable to get normalized significant (left-most bit set to 1
).
回答1:
The while
loops adjust the exponent in order to place the first binary 1 of fnorm
just before the dot (in base 2).
So at most fnorm
is 1.1111111... in base 2, which is almost 2.0 in base 10.
At least fnorm
is 1.000000... in base 2, which is 1.0 in base 10.
In IEEE754, the significand of a normalised number (not subnormal) has the form 1.xxxxxx... (base 2), which conforms to the previous loops.
The first bit, before the dot, is always 1 that's why it is not necessary to memorize it.
(may be this last remark is the main point of your question)
After normalisation, your algorithm substracts 1.0, which leads to 0.xxxxx... as you saw.
Substracting 1.0 does not lose any information as long as we remember this substraction is systematic.
Multiplying this float value (strictly less than 1.0, but not negative) by the integer 1LL<<significandbits
gives a float which is strictly less than this big integer.
Thus, converting it into an integer will give a value that does not overflow the significant bits.
(I guess the 0.5
increment helps rounding the last bit)
This integer contains all the significant bits that were originally in the significand of the floating point value.
Knowing it, the shift, and the sign makes possible the reconstitution of the original floating point value.
But, as suggested in the comments, since IEEE754 bit pattern is well defined, all of this may not be necessary.
来源:https://stackoverflow.com/questions/57168435/normalization-part-of-a-code-of-packing-a-float-ieee-754-into-uint64-t