Reading double to platform endianness with union and bit shift, is it safe?

末鹿安然 提交于 2019-12-10 10:37:44

问题


All the examples I've seen of reading a double of known endianness from a buffer to the platform endianness involve detecting the current platform's endianess and performing byte-swapping when necessary.

On the other hand, I've seen another way of doing the same thing except for integers that uses bit shifting (one such example).

This got me thinking that it might be possible to use a union and the bitshift technique to read doubles (and floats) from buffers, and a quick test implementation seemed to work (at least with clang on x86_64):

#include <stdio.h>
#include <stdint.h>
#include <stdbool.h>

double read_double(char * buffer, bool le) {
    union {
        double d;
        uint64_t i;
    } data;
    data.i = 0;

    int off = le ? 0 : 7;
    int add = le ? 1 : -1;
    for (int i = 0; i < 8; i++) {
        data.i |= ((uint64_t)(buffer[off] & 0xFF) << (i * 8));
        off += add;
    }
    return data.d;
}

int main() {
    char buffer_le[] = {0x6E, 0x86, 0x1B, 0xF0, 0xF9, 0x21, 0x09, 0x40};
    printf("%f\n", read_double(buffer_le, true)); // 3.141590

    char buffer_be[] = {0x40, 0x09, 0x21, 0xF9, 0xF0, 0x1B, 0x86, 0x6E};
    printf("%f\n", read_double(buffer_be, false)); // 3.141590

    return 0;
}

My question though is, is this a safe way to do this? Or is there undefined behavior involved here? Or if both this and the byte-swap method involve undefined behavior, is one safer than the other?


回答1:


Reinterpreting Through a Union

Constructing a uint64_t value by shifting and ORing bytes is of course supported by the C standard. (There is some hazard when shifting due to the need to ensure the left operand is the correct size and type to avoid issues with overflow and shift width, but the code in the question correctly converts to uint64_t before shifting.) Then the question remaining for the code is whether reinterpreting through a union is permitted by the C standard. The answer is yes.

C 6.5.2.3 3 says:

A postfix expression followed by the . operator and an identifier designates a member of a structure or union object. The value is that of the named member,99)

and note 99 says:

If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning")…

Such reinterpretation of course relies on the object representations used in the C implementation. Notably the double must use the expected format, matching the bytes read from the input stream.

Modifying the Bytes of an Object

Modifying an object by modifying its bytes (as by using a pointer to unsigned char) is permitted by C. C 2018 6.5 7 says:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [list of various types], or a character type.

Although one of the comments states that you may “access” but not “modify” the bytes of an object this way (apparently interpreting “access” to mean only reading, not writing), C 2018 3.1 defines “access” as:

to read or modify the value of an object.

Thus, one is permitted to read or write the bytes of an object through character types.




回答2:


Reading double to platform endianness with union and bit shift, is it safe?

This kind of thing only makes sense when dealing with data from outside the program (e.g. data from a file or network); where you have a strict format for the data (defined in the file format's specification or the network protocol's specification) that may have nothing to do with the format C uses, may have nothing to do with the CPU uses and may not be IEEE 754 format either.

On the other side C doesn't provide any guarantees at all. For a simple example, it's perfectly legal for the compiler to use a BCD format for float where 0x12345e78 = 1.2345 * 10**78, even if the CPU itself happens to support "IEEE 754".

The result is you have "whatever the spec says format" from outside the program and you're converting that into a different "whatever the compiler felt like format" for use inside the program; and every single assumption you've made (including sizeof(double)) is potentially false.



来源:https://stackoverflow.com/questions/52631595/reading-double-to-platform-endianness-with-union-and-bit-shift-is-it-safe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!