All the examples I've seen of reading a double of known endianness from a buffer to the platform endianness involve detecting the current platform's endianess and performing byte-swapping when necessary.
On the other hand, I've seen another way of doing the same thing except for integers that uses bit shifting (one such example).
This got me thinking that it might be possible to use a union and the bitshift technique to read doubles (and floats) from buffers, and a quick test implementation seemed to work (at least with clang on x86_64):
#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>
double read_double(char * buffer, bool le) {
union {
double d;
uint64_t i;
} data;
data.i = 0;
int off = le ? 0 : 7;
int add = le ? 1 : -1;
for (int i = 0; i < 8; i++) {
data.i |= ((uint64_t)(buffer[off] & 0xFF) << (i * 8));
off += add;
}
return data.d;
}
int main() {
char buffer_le[] = {0x6E, 0x86, 0x1B, 0xF0, 0xF9, 0x21, 0x09, 0x40};
printf("%f\n", read_double(buffer_le, true)); // 3.141590
char buffer_be[] = {0x40, 0x09, 0x21, 0xF9, 0xF0, 0x1B, 0x86, 0x6E};
printf("%f\n", read_double(buffer_be, false)); // 3.141590
return 0;
}
My question though is, is this a safe way to do this? Or is there undefined behavior involved here? Or if both this and the byte-swap method involve undefined behavior, is one safer than the other?
Reinterpreting Through a Union
Constructing a uint64_t
value by shifting and ORing bytes is of course supported by the C standard. (There is some hazard when shifting due to the need to ensure the left operand is the correct size and type to avoid issues with overflow and shift width, but the code in the question correctly converts to uint64_t
before shifting.) Then the question remaining for the code is whether reinterpreting through a union is permitted by the C standard. The answer is yes.
C 6.5.2.3 3 says:
A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,99)…
and note 99 says:
If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning")…
Such reinterpretation of course relies on the object representations used in the C implementation. Notably the double
must use the expected format, matching the bytes read from the input stream.
Modifying the Bytes of an Object
Modifying an object by modifying its bytes (as by using a pointer to unsigned char
) is permitted by C. C 2018 6.5 7 says:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [list of various types], or a character type.
Although one of the comments states that you may “access” but not “modify” the bytes of an object this way (apparently interpreting “access” to mean only reading, not writing), C 2018 3.1 defines “access” as:
to read or modify the value of an object.
Thus, one is permitted to read or write the bytes of an object through character types.
Reading double to platform endianness with union and bit shift, is it safe?
This kind of thing only makes sense when dealing with data from outside the program (e.g. data from a file or network); where you have a strict format for the data (defined in the file format's specification or the network protocol's specification) that may have nothing to do with the format C uses, may have nothing to do with the CPU uses and may not be IEEE 754 format either.
On the other side C doesn't provide any guarantees at all. For a simple example, it's perfectly legal for the compiler to use a BCD format for float
where 0x12345e78
= 1.2345 * 10**78
, even if the CPU itself happens to support "IEEE 754".
The result is you have "whatever the spec says format" from outside the program and you're converting that into a different "whatever the compiler felt like format" for use inside the program; and every single assumption you've made (including sizeof(double)
) is potentially false.
来源:https://stackoverflow.com/questions/52631595/reading-double-to-platform-endianness-with-union-and-bit-shift-is-it-safe