Reinterpreting a union to a different union

后端 未结 4 938
时光说笑
时光说笑 2021-02-01 15:16

I have a standard-layout union that has a whole bunch of types in it:

union Big {
    Hdr h;

    A a;
    B b;
    C c;
    D d;
    E e;
    F f;
};

相关标签:
4条回答
  • 2021-02-01 15:29

    I can find no wording in n4296 (draft C++14 standard) which would make this legal. What is more, I cannot even find any wording that given:

    union Big2 {
        Hdr h;
    
        A a;
        B b;
        C c;
        D d;
        E e;
        F f;
    };
    

    we can reinterpret_cast a reference to Big into a reference to Big2 and then use the reference. (Note that Big and Big2 are layout-compatible.)

    0 讨论(0)
  • 2021-02-01 15:40

    To be able to take a pointer to A, and reinterpret it as a pointer to B, they must be pointer-interconvertible.

    Pointer-interconvertible is about objects, not types of objects.

    In C++, there are objects at places. If you have a Big at a particular spot with at least one member existing, there is also a Hdr at that same spot due to pointer interconvertability.

    However there is no Little object at that spot. If there is no Little object there, it cannot be pointer-interconvertible with a Little object that isn't there.

    They appear to be layout-compatible, assuming they are flat data (plain old data, trivially copyable, etc).

    This means you can copy their byte representation and it works. In fact, optimizers seem to understand that a memcpy to a stack local buffer, a placement new (with trivial constructor), then a memcpy back is actually a noop.

    template<class T>
    T* laundry_pod( void* data ) {
      static_assert( std::is_pod<Data>{}, "POD only" ); // could be relaxed a bit
      char buff[sizeof(T)];
      std::memcpy( buff, data, sizeof(T) );
      T* r = ::new( data ) T;
      std::memcpy( data, buff, sizeof(T) );
      return r;
    }
    

    the above function is a noop at runtime (in an optimized build), yet it converts T-layout-compatible data at data to an actual T.

    So, if I am right and Big and Little are layout-compatible when Big is a subtype of the types in Little, you can do this:

    Little* inplace_to_little( Big* big ) {
      return laundry_pod<Little>(big);
    }
    Big* inplace_to_big( Little* big ) {
      return laundry_pod<Big>(big);
    }
    

    or

    void given_big(Big& big) { // cannot be const
      switch(big.h.type) {
      case B::type: // fallthrough
      case C::type:
        auto* little = inplace_to_little(&big); // replace Big object with Little inplace
        given_b_or_c(*little); 
        inplace_to_big(little); // revive Big object.  Old references are valid, barring const data or inheritance
        break;
      // ... other cases here ...
      }
    }
    

    if Big has non-flat data (like references or const data), the above breaks horribly.

    Note that laundry_pod doesn't do any memory allocation; it uses placement new that constructs a T in the place where data points using the bytes at data. And while it looks like it is doing lots of stuff (copying memory around), it optimizes to a noop.


    c++ has a concept of "an object exists". The existence of an object has almost nothing to do with what bits or bytes are written in the physical or abstract machine. There is no instruction on your binary that corresponds to "now an object exists".

    But the language has this concept.

    Objects that don't exist cannot be interacted with. If you do so, the C++ standard does not define the behavior of your program.

    This permits the optimizer to make assumptions about what your code does and what it doesn't do and which branches cannot be reached and which can be reached. It lets the compiler make no-aliasing assumptions; modifying data through a pointer or reference to A cannot change data reached through a pointer or reference to B unless somehow both A and B exist in the same spot.

    The compiler can prove that Big and Little objects cannot both exist in the same spot. So no modification of any data through a pointer or reference to Little could modify anything existing in a variable of type Big. And vice versa.

    Imagine if given_b_or_c modifies a field. Well the compiler could inline given_big and given_b_or_c and use_a_b, notice that no instance of Big is modified (just an instance of Little), and prove that fields of data from Big it cached prior to calling your code could not be modified.

    This saves it a load instruction, and the optimizer is quite happy. But now you have code that reads:

    Big b = whatever;
    b.foo = 7;
    ((Little&)b).foo = 4;
    if (b.foo!=4) exit(-1);
    

    that is optimzied to

    Big b = whatever;
    b.foo = 7;
    ((Little&)b).foo = 4;
    exit(-1);
    

    because it can prove that b.foo must be 7 it was set once and never modified. The access through Little could not modify the Big due to aliasing rules.

    Now do this:

    Big b = whatever;
    b.foo = 7;
    (*laundry_pod<Little>(&b)).foo = 4;
    Big& b2 = *laundry_pod<Big>(&b);
    if (b2.foo!=4) exit(-1);
    

    and it the assume that the big there was unchanged, because there is a memcpy and a ::new that could legally change the state of the data. No strict aliasing violation.

    It can still follow the memcpy and eliminate it.

    Live example of laundry_pod being optimized away. Note that if it wasn't optimized away, the code would have to have a conditional and a printf. But because it was, it was optimized into the empty program.

    0 讨论(0)
  • 2021-02-01 15:43

    This is UB by omission. [expr.ref]/4.2:

    If E2 is a non-static data member and the type of E1 is “cq1 vq1 X”, and the type of E2 is “cq2 vq2 T”, the expression [E1.E2] designates the named member of the object designated by the first expression.

    During the evaluation of the given_b_or_c call in given_big, the object expression in little.h does not actually designate a Little object, and ergo there's no such member. Because the standard "omits any explicit definition of behavior" for this case, the behavior is undefined.

    0 讨论(0)
  • 2021-02-01 15:43

    I'm not sure, if this really applies here. In the reinterpret_cast - Notes section they talk about pointer-interconvertible objects.

    And from [basic.compound]/4:

    Two objects a and b are pointer-interconvertible if:

    • they are the same object, or
    • one is a union object and the other is a non-static data member of that object, or
    • one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, the first base class subobject of that object, or
    • there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

    If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast.

    In this case, we have Hdr h; (c) as a non-static data member in both unions, which should allow to (because of the second and last bullet point)

    Big* (a) -> Hdr* (c) -> Little* (b)
    
    0 讨论(0)
提交回复
热议问题