Can we use any value in floating point for customized flags?

问题

I write code in LINUX RHEL 64bit, and use C++98.

I have an array of floating point values, and I wanted to 'mark' some values to be 'invalid'. One possible solution is to use another bit-array to tell if the corresponding value is valid.

I was wondering if we can use any special double value. The link Why does IEEE 754 reserve so many NaN values? says that there are lot of NaN values. Can we use any value reserved for my problem?

I only need one bit in the payload to indicate if a double value is 'valid' in terms of my definition. The input double value can contain NaN, but I assume it would not use any payload.
the double values will be saved into a file in a binary mode (saving double values bit by bit).
then the code also reads data from the file in a binary mode
For each double value read from the file, we first check if it is valid by the bit set in payload before doing any other calculation.

回答1:

Generally speaking, IEEE-754 allows, but does not require, support for NaN payloads. However, here we have the specific case of x64 systems, and the relevant processors from AMD and Intel support NaN payloads.

IEEE Std 754-2008 further specifies that with NaN encodings, the most significant fraction bit of the mantissa distinguishes between quiet and signalling NaNs. This corresponds to the most significant stored mantissa bit for single- and double-precision types. It follows that one cannot use this bit for custom encoding purposes. x64 processor generate the specific QNaN INDEFINITE in response to various exceptional situations, and the sign bit of the QNaN encoding is used for that, so the sign bit is also off-limits for custom NaN-based flagging.

Various toolchains provide relaxed, non-IEEE-754 compliant "fast math", in which propagation of NaNs is not guaranteed. You would need to compile with the strictest floating-point setting (e.g. Intel compiler -fp-model strict) to ensure the custom flagging does not get lost. Various software environments use NaN payloads to encode the particular event that gave rise to the creation of a NaN (the SANE by Apple is an historical example of such a system). In my experience, such systems typically utilize low-order bits of the mantissa portion of a NaN encoding.

This would suggest that high-order mantissa bits, say, bits 50:48 of an IEEE-754 double-precision number or bits 21:19 of an IEEE-754 single-precision number is the best place to place custom flags inside a NaN encoding (leaving untouched the most significant mantissa bit, as mentioned). Transport of data through both float and double types may be problematic as propagation of NaN payloads between different floating-point types is not specified by the x64 architecture specification best I can find out from reviewing AMD's original x64 architecture specification and Intel latest documentation. Purely empirically, I find that NaN payloads are handled such that bit [n] of a single-precision encoding appears as bit [n+29] of the double-precision encoding, and vice versa.

Given the constraints on programming language, it will be best to use memcpy() to transfer between floating-point and unsigned integer representations, and perform the required bit-level operations to set, clear, test custom NaN payloads in integer space. Many optimizing compilers will optimize the memcpy() away and replace it by hardware instructions that transfer data between x84 floating-point and integer registers, but you would want to double-check the generated machine code to make sure of that if the performance of these operations matters.

来源：https://stackoverflow.com/questions/45174949/can-we-use-any-value-in-floating-point-for-customized-flags

标签

floating-point

nan

ieee-754