Unions and type-punning

后端 未结 5 2007
广开言路
广开言路 2020-11-22 04:46

I\'ve been searching for a while, but can\'t find a clear answer.

Lots of people say that using unions to type-pun is undefined and bad practice. Why is this? I can\

5条回答
  •  南笙
    南笙 (楼主)
    2020-11-22 05:18

    There are (or at least were, back in C90) two modivations for making this undefined behavior. The first was that a compiler would be allowed to generate extra code which tracked what was in the union, and generated a signal when you accessed the wrong member. In practice, I don't think any one ever did (maybe CenterLine?). The other was the optimization possibilities this opened up, and these are used. I have used compilers which would defer a write until the last possible moment, on the grounds that it might not be necessary (because the variable goes out of scope, or there is a subsequent write of a different value). Logically, one would expect that this optimization would be turned off when the union was visible, but it wasn't in the earliest versions of Microsoft C.

    The issues of type punning are complex. The C committee (back in the late 1980's) more or less took the position that you should use casts (in C++, reinterpret_cast) for this, and not unions, although both techniques were widespread at the time. Since then, some compilers (g++, for example) have taken the opposite point of view, supporting the use of unions, but not the use of casts. And in practice, neither work if it is not immediately obvious that there is type-punning. This might be the motivation behind g++'s point of view. If you access a union member, it is immediately obvious that there might be type-punning. But of course, given something like:

    int f(const int* pi, double* pd)
    {
        int results = *pi;
        *pd = 3.14159;
        return results;
    }
    

    called with:

    union U { int i; double d; };
    U u;
    u.i = 1;
    std::cout << f( &u.i, &u.d );
    

    is perfectly legal according to the strict rules of the standard, but fails with g++ (and probably many other compilers); when compiling f, the compiler assumes that pi and pd can't alias, and reorders the write to *pd and the read from *pi. (I believe that it was never the intent that this be guaranteed. But the current wording of the standard does guarantee it.)

    EDIT:

    Since other answers have argued that the behavior is in fact defined (largely based on quoting a non-normative note, taken out of context):

    The correct answer here is that of pablo1977: the standard makes no attempt to define the behavior when type punning is involved. The probable reason for this is that there is no portable behavior that it could define. This does not prevent a specific implementation from defining it; although I don't remember any specific discussions of the issue, I'm pretty sure that the intent was that implementations define something (and most, if not all, do).

    With regards to using a union for type-punning: when the C committee was developing C90 (in the late 1980's), there was a clear intent to allow debugging implementations which did additional checking (such as using fat pointers for bounds checking). From discussions at the time, it was clear that the intent was that a debugging implementation might cache information concerning the last value initialized in a union, and trap if you tried to access anything else. This is clearly stated in §6.7.2.1/16: "The value of at most one of the members can be stored in a union object at any time." Accessing a value that isn't there is undefined behavior; it can be assimilated to accessing an uninitialized variable. (There were some discussions at the time as to whether accessing a different member with the same type was legal or not. I don't know what the final resolution was, however; after around 1990, I moved on to C++.)

    With regards to the quote from C89, saying the behavior is implementation-defined: finding it in section 3 (Terms, Definitions and Symbols) seems very strange. I'll have to look it up in my copy of C90 at home; the fact that it has been removed in later versions of the standards suggests that its presence was considered an error by the committee.

    The use of unions which the standard supports is as a means to simulate derivation. You can define:

    struct NodeBase
    {
        enum NodeType type;
    };
    
    struct InnerNode
    {
        enum NodeType type;
        NodeBase* left;
        NodeBase* right;
    };
    
    struct ConstantNode
    {
        enum NodeType type;
        double value;
    };
    //  ...
    
    union Node
    {
        struct NodeBase base;
        struct InnerNode inner;
        struct ConstantNode constant;
        //  ...
    };
    

    and legally access base.type, even though the Node was initialized through inner. (The fact that §6.5.2.3/6 starts with "One special guarantee is made..." and goes on to explicitly allow this is a very strong indication that all other cases are meant to be undefined behavior. And of course, there is the statement that "Undefined behavior is otherwise indicated in this International Standard by the words ‘‘undefined behavior’’ or by the omission of any explicit definition of behavior" in §4/2; in order to argue that the behavior is not undefined, you have to show where it is defined in the standard.)

    Finally, with regards to type-punning: all (or at least all that I've used) implementations do support it in some way. My impression at the time was that the intent was that pointer casting be the way an implementation supported it; in the C++ standard, there is even (non-normative) text to suggest that the results of a reinterpret_cast be "unsurprising" to someone familiar with the underlying architecture. In practice, however, most implementations support the use of union for type-punning, provided the access is through a union member. Most implementations (but not g++) also support pointer casts, provided the pointer cast is clearly visible to the compiler (for some unspecified definition of pointer cast). And the "standardization" of the underlying hardware means that things like:

    int
    getExponent( double d )
    {
        return ((*(uint64_t*)(&d) >> 52) & 0x7FF) + 1023;
    }
    

    are actually fairly portable. (It won't work on mainframes, of course.) What doesn't work are things like my first example, where the aliasing is invisible to the compiler. (I'm pretty sure that this is a defect in the standard. I seem to recall even having seen a DR concerning it.)

提交回复
热议问题